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1  Summary 

A  user  (human  or  computer-based  speech  recognizer)  receiving  instructions  from  various 
speech  sources  (human  or  synthesized)  may  be  overwhelmed  when  these  sources  are  speaking 
simultaneously  in  an  unfavorable  acoustical  and  noisy  environment.  In  such  a  case,  the  user 
is  required  to  separate  the  various  sources  from  the  mixture  in  order  to  make  the  speech 
intelligible.  If  no  one  source  dominates  or  the  mixing  occurs  for  a  sustained  period  of  time, 
the  human  user  may  become  mentally  and  physically  overloaded  resulting  in  fatigue  and 
thus  failing  to  separate  the  various  speech  sources  into  intelligible  signals.  In  the  case  of  the 
speech  recognizer,  recognition  accuracy  may  be  degraded  to  unacceptable  levels. 

In  this  research,  we  have  gained  an  understanding  of  the  source  separation  problem  for 
convolutional  mixtures  and  developed  expertise  with  the  Rrequency-Domain,  Second-Order 
Statistics  (FDSOS)  based  blind  speech  separation  (BSS)  algorithms.  As  a  starting  point,  we 
have  implemented  a  recently-developed  version  called  the  Multiresolution  Frequency-Domain 
(MFRD)  algorithm  and  analyzed  its  performance.  We  have  developed  several  modifications 
to  the  MRFD  algorithm  for  better  performance  at  reduced  cost,  implemented  it,  and  evalu¬ 
ated  it  under  a  wide  set  of  noise  and  acoustic  environments  in  both  simulated  and  real-world 
tests.  Modifications  include  methods  for  exploiting  frequency-domain  symmetries  in  order 
to  reduce  computation  by  50%  and  initialization  strategies  for  improved  performance.  We 
have  experimentally  determined  optimal  parameters  for  the  algorithm  and  studied  the  ef¬ 
fects  of  room  size  and  liveliness  on  separation  performance.  Simulations  with  both  digitally- 
filtered  and  mixed  speech  signals  as  well  as  real-world  signals  indicate  good  performance 
with  this  algorithm  with  10-20dB  improvement  to  the  Signal-to-Interference  Ratio  (SIR)  of 
these  sources. 

In  the  important  case  of  speech  enhancement,  i.e.  blind  suppression  of  background  noise 
from  a  desired  speech  signal,  we  have  been  able  to  increase  Signal-to-Interference  Ratio 
Improvement  (SIR!)  upwards  of  20dB.  This  may  have  significant  impact  on  the  problem 
of  automatic  speech  recognition  in  noisy  environments.  Our  work  has  also  demonstrated 
a  robustness  to  separating  speech  signals  in  the  presence  of  noise.  The  potential  of  these 
algorithms  for  real-time  processing  has  been  examined  and  we  have  concluded  that  good 
performance  is  possible  but  with  at  least  a  2sec.  delay  due  to  the  inherent  “block  processing” 
nature  of  the  algorithm.  Finally,  we  have  developed  a  blind  speech  signal  corpus  or  database 
containing  speedi  signal  mixtures  in  a  variety  of  acoustical  environments.  This  database  is 
now  available  on  the  Internet  at 

http :  //WWW .  ece .  nmsu .  edu/"pdeleon/BSS/index .  html 
for  free  download. 
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2  Introduction 


In  many  audio-interface,  multimedia,  and  speech  recognition  applications,  mixtures  of  speech 
signals  from  various  competing  acoustic  sources  (other  speakers  and  noise)  must  be  separated 
out  before  processing  [1],  Given  the  complicated  nature  of  speech  signals  (non-stationary, 
overlap  in  time-  and  frequency-domains,  etc. . . )  this  is  a  difficult  problem  compounded 
by  environmental  effects  such  as  noise  and  reverberation  and  a  strong  desire  for  a  simple 
algorithm  suitable  for  real-time  operation.  Several  methods  have  been  proposed  some  of 
which  have  shown  moderate  success  at  various  aspects  of  the  problem  but  often  at  the 
expense  of  high  computational  complexity  [2], [3]. 

The  basic  problem  (two  channel  case)  is  illustrated  in  Fig.  1  and  described  as  follows. 
Let  si,S2  be  two  unknown  source  speech  signals  and  xi,X2  be  two  mixture  signals  each 
composed  of  the  sum  of  the  acoustically-filtered  (echoed  and  reverberated)  source  signals; 
the  impulse  response  from  source  i  to  microphone  j  is  given  by  hj,.  The  problem  is  this: 
given  only  the  received  signals  (mixtures  of  filtered  speech),  xi,X2,  produce  two  signals  yi,y2 
which  approximate  the  individual  source  signals.  Note  we  have  no  information  regarding  the 
original  sources  or  acoustical  filters.  The  “blind”  designation  denotes  the  fact  we  are  given 
no  a  priori  information  to  use  in  the  separation  task. 

yi  yi 


Figure  1:  Block  diagram  of  speech  separation  problem. 


The  mixture  signals  in  Fig.  1  can  be  expressed  as  a  convolution,  i,e., 

Xj{n)  =  Y^f^hji{p)si{n-p),j  =  1,2  (2.1) 

1=1  p=0 


where  hji{p)  models  the  P-point  impulse  response  from  source  i  to  microphone  j.  The  two 
mixture  signals  in  (2.1)  can  be  stacked  in  vector  form  and  written  as 


ii(n) 

X2(n)  _ 

hnin)  hi2{n) 

si(n) 

h2i{n)  /i22(n) 

S2(n) 

(2.2) 


or  simply  as. 


x(n)  =  H(n)*s(n), 


(2.3) 
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where  x(n)  =  [a:i(n)  0:2(71)]^  is  the  vector  of  received  mixture  signals,  H(n)  is  the  2x2  mixing 
filter  matrix  whose  elements  are  *  is  the  convolution  operator,  and  s(n)  =  [si(n)  S2(n)]^ 
is  the  source  signal  vector.  The  aim  of  the  BSS  method  is  to  find  a  2  x  2  un-mixing  filter 
matrix,  W(n),  where  each  element  is  of  length  Q,  that  separates  the  two  sources  up  to  an 
arbitrary  filter  and  permutation,  i.e. 

yi(n)  ^  wu{n)  wnin)  ^  xi{n)  , 

.  2/2(71)  J  [  7021(71)  7022(71)  [  X2(7i)  _  '  ■ 

or  simply  as, 


y(n)  =  W(n)*x(n),  (2.5) 

where  y(n)  =  (^1(77)  2/2(71)]^  is  the  vector  of  output  signals  approximating  the  original 
sources,  and  W(7i)  is  the  2x2  un-mixing  filter  matrix  whose  elements  are  Wji.  The  “permu¬ 
tation”  designation  is  to  express  the  fact  that  either  permutation  y(n)  =  [2/1(77)  2/2(71)]^  or 
y(n)  =  [2/2(71)  2/1(71)]^  is  an  acceptable  result.  The  overall  block  diagram  is  given  in  Fig.  2. 


Source  signals, 

s(  ) 


Unknown,  Acoustical  Un-mixing  Filters 

Filtering  (to  be  built) 


x(  )  (given) 


Output  signals, 

y( ) 

(source  signal  estimates) 


Figure  2:  Block  diagram  of  Blind  Speech  Separation  setting. 
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3  Methods,  Assumptions,  and  Procedures 


3.1  Introduction 

To  date,  one  of  the  most  successful  methods  for  the  blind  separation  of  speech  signals  in 
reverberant  environments  is  that  recently  proposed  by  Parra  and  Spence  and  further  im¬ 
proved  by  Ikram  and  Morgan  [3,  4].  Both  of  these  algorithms  have  their  roots  in  the  work 
by  Cardoso  and  are  based  on  Frequency-Domain,  Second  Order  Statistics  (FDSOS).  Here 
we  attempt  to  decorrelate  the  speech  signals  from  one  another  in  the  frequency  domain, 
i.e.  diagonalize  covariance  matrices.  Since  the  original  speech  signals  are  nonstationary  and 
uncorrelated  but  not  independent,  decorrelation  is  enough  to  theoretically,  produce  sepa¬ 
rated  sources.  The  methods  are  frequency-domain-based  so  that  a  difficult  time-domain 
deconvolution  involving  the  acoustical  filters  can  be  avoided. 


3.2  Frequency-Domain,  Second  Order  Statistics-Based  Algorithms 

We  first  begin  by  transforming  the  time-domain  convolutive  mixture  vector,  x(n)  in  (2.3) 
to  a  frequency-domain,  instantaneous  mixture  by  computing  its  T-point  short-time  Fourier 
transform  (STFT) 

a:i(u;,m) 

X2{u,m) 

or 


ifii(a;)  HM 


Si(w,Tn) 

S2{u},m) 


(3.1) 


x(a;,m)  =  H(u;)s(aJ,m) 


(3.2) 


where  x(iJ,m)  =  [a:i(a;,m)  X2(cJ,m)]^  is  the  STFT  of  the  mixture  signal  vector,  s(a;,m)  = 
[si(a;,  m)  £2(^1  STFT  of  the  source  signal  vectors,  H(a;)  is  the  matrix  of  acoustical 

frequency  responses  with  the  response  from  source  i  to  microphone  j  ,  m  is  the  block 

index,  and  ideally,  T  =  2P.  For  a  given  set  of  received  data,  x(n),  n  =  0, . . . ,  iV  —  1,  we 
obtain  the  STFT  as 

x(a;,  wi)  =  ^  tt;(r)x(/5Tm  -{■  r)e“-’^’'/^,  (3.3) 

r=0 

for  w  =  1,...,T  and  m  =  0, . . . , A^/(/3T)  —  1,  where  w(t)  is  a  window  function  and  ^ 
(0  <  /d  <  1)  is  the  data  overlap  factor.  Fig.  3  illustrates  the  block  structure  of  STFT  and 
associated  notation.  Similarly  (2.3)  transforms  to 


or 


yi(w,m) 

jfe(a;,m) 

W2M 


Wi2{uj) 

W22iu) 


Xi{u},m) 

X2(a>,m) 


(3.4) 


y(u),m)  =  W(a;)x(a;,m)  (3.5) 

where  y(ai,rn)  =  [yi{uj,7n)  y2iuJ,m)]'^  is  the  STFT  of  the  output  signal  vector  and  W(w) 
is  the  matrix  of  un-mixing  filter  frequency  responses  with  Wji{u)  the  response  of  the  jith 


4 


X  n 


Figure  3:  Block  structure  of  STFT. 


un-mixing  filter.  We  employee  the  overlap-add  method  [5]  to  S5mthesize  the  separated  time- 
domain  speech  signals  as 

u-i 

y  W  =  5Zy(”  m),  n  =  0, . . . ,  iV  -  1  (3.6) 

m=0 

where 

Hr,m.)  =  r  =  0....,T-l  (3.7) 

a;=0 

is  the  inverse  STFT,  In  the  synthesis  given  by  (3.6),  we  have  windowed  blocks  spaced  by  /3T 
samples  in  time  and  overlapped  and  added  to  produce  the  output  signals. 

We  assume  that  the  source  signals  are  quasi-stationary,  i,e,  statistics  are  slowly  varying 
with  time.  Thus,  we  can  treat  the  speech  signals  as  a  succession  of  super  blocks  of  stationary 
signals,  and  analyze  their  spectrum  on  a  time-var3dng  basis.  The  assumption  of  uncorre- 
latedness  of  source  signals  Si(n)  and  S2(n)  can  therefore  be  represented  in  the  time-domain 
by  a  cross-correlation  of  zero  or  in  the  frequency-domain  by  a  Cross-Power  Spectral  Density 
(CPSD)  of  zero.  The  Power  Spectral  Density  (PSD)  matrix  of  source  vector,  s  at  frequency 
u  will  then  have  following  diagonal  form 

Rs{iv,k)  = 


where  x  is  some  non-zero  value  representing  either  the  PSD  of  si(n)  or  S2{n)  and  k  is  the 
superblock  index. 

Under  the  assumption  of  uncorrelated  source  signals,  we  seek  to  build  frequency  by 
frequency,  an  un-mixing  filter  matrix,  W(a>)  that  decorrelates  the  output  signals  yi{n)  and 


E  ^s(a;,  m)s^(a;,  m)] 
0 


X 

0 


(3.8) 
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y^in)  at  each  frequency  uj.  Thus  when  applied  to  x(a;,  k),  we  have  a  diagonal  PSD  matrix 
for  the  output  signals  given  by 

Ry(tJ,A:)  =  f;[y(a;,m)y^(a;,m)] 

=  W(tj)E  [^x(a;,Tn)x^(a  .  ' ■W^(uj) 

=  W(a;)Rx(a;,A:)W"(a;).  (3.9) 

Here  we  consider  y(n)  as  the  output  of  a  linear  shift  invariant  (LSI)  system,  W(n)  with 
quasi-stationary  input  signals,  s(n).  The  frequency-domain,  covariance  matrix,  Rx(a),A:), 
can  be  estimated  (also  assuming  ergodicity  in  each  superblock)  with  an  average  of  the  outer 
products  of  the  received  signal  vectors 

1 

Rx(£i;,  fc)  =  —  ^x{u},Mk  +  m)x^{u),Mk  +  m),  (3.10) 

^  m=0 

for  k  =  0, . . . ,  A'  -  1  provided  the  number  of  blocks  in  each  super  block,  M,  is  large  enough. 
Note  that  in  (3.10),  the  frequency-domain  data  is  averaged  over  M  =  N/{KI3T),  possibly 
overlapping,  consecutive  blocks  to  obtain  the  covariance  at  a  super-block  index  k. 

The  decorrelation  objective  is  therefore  to  find  a  sequence  of  un-mixing  filter  matrices, 
W(tj)  which  minimizes  the  CPSD  at  each  frequency  or  effectively  diagonalizes  (3.9)  as  much 
as  possible  for  each  frequency  u  and  each  superblock  k.  The  filters  are  then  applied  to 
x(a;,  m)  resulting  in  (hopefully)  decorrelated  and  thus  separated  output  signals. 

The  second-order  decorrelation  criterion  alone  does  not  provide  enough  conditions  to 
solve  for  W(u;),  unless  the  number  of  outputs  is  twice  the  number  of  inputs  (four  outputs 
for  the  two-input  case)  [3].  However,  for  non-stationary  signals,  we  can  write  independent 
decorrelation  equations  for  K  sufficiently  separated  time  intervals.  As,  shown  in  [3],  a  set  of 
K  equations  give  a  total  of  KT  constraints  on  the  AQ  unknown  coefficients  of  W(tj).  In 
order  to  build  an  over-determined,  least-squares  problem  (from  which  there  will  be  at  least 
one  solution),  it  is  required  that  KT  >  4Q.  The  approximation  of  linear  convolution  (2.1)  by 
circular  convolution  (3.1)  requires  P  ■^T,  and  therefore  Q  <^T.  However,  this  constraint 
has  to  be  relaxed.  For  example,  in  a  typically  sized  room,  P  is  always  on  the  order  of  10^ 
at  the  sampling  rate  of  8kHz.  But  the  r  Smation  of  the  PSD  as  well  as  the  computational 
complexity  require  that  T  not  to  be  to<  .  rge,  typically,  less  than  10^.  In  managing  all  these 
constraints,  we  always  choose  Q  <T,  and  set  >  4  to  avoid  the  under-determined  case. 

The  un-mixing  filter,  W(a;)  for  each  frequency  bin  u  (u  —  1, . . .  ,T)  that  simultaneously 
satisfies  the  K  decorrelation  equations  is  obtained  using  an  over-determined  least-squares 
solution  which  seeks  to  r  iinimize  the  off-diagonal  elements  of  (3.9).  In  this  framework,  the 
un-mixing  filter  matrix  iii  given  by 

W(a;)  =  argrain  ^||V(a;,fc)|||.,  (3.11) 

where  1|  •  is  the  Probenius  norm  squared  (sum  of  squares  of  all  elements)  and  the  off- 
diagonal  matrix  is  given  by 

V(u),k)  =  W(a;)RxW"(ai)  -  diag  [W(u;)RxW»(a;)] ,  (3.12) 
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where  diag  [■]  is  the  diagonal  matrix  formed  by  extracting  the  diagonal  elements  of  the  matrix 
argument.  The  least-squares  solution  to  (3.11)  can  be  searched  for  using  the  well-known 
steepest  descent  algorithm 

W<‘*‘>(u)  =  W(‘)(a.)  -  {S  (3.13) 

for  u;  =  1, . . . ,  T.  In  this  equation,  we  iteratively  update  the  filter  matrix  W(a;)  by  adjusting 
it  in  the  direction  which  leads  to  the  “smallest”  V(a;,  A:),  i.e.  smallest  off-diagonal  values  of 
(3.9).  In  Appendix  A,  we  provide  the  complete  derivation  for  the  gradient  term  in  (3.13). 
Substituting  the  result  of  the  derivation  into  (3.13)  leads  to  the  filter  update 

K 

W('+i)(u;)  =  W<')(a;)-//(a;)£;V(w,fc)W(a;,fc)l^(a;,A:).  (3.14) 

fc=i 

In  [3],  it  is  noted  that  the  gradient  terms  scale  with  the  square  of  the  signal  powers,  which 
vary  considerably  across  frequency  and  result  in  wild  fluctuation  of  the  gradient  terms  for 
different  frequencies.  Thus,  we  use  a  step  size  function 

<'(“)  =  Ic - - -  (315) 

k=l 

where  p,  is  a.  normalized  step  size.  To  fix  the  arbitrary  scaling  in  W((j),  only  the  off-diagonal 
elements  of  W(a;)  are  updated,  thus  diagonal  elements  always  remain  at  their  initial  values. 

3.3  The  Permutation  Inconsistency/Spectral  Resolution  Tradeoff 

If  W(u;)  diagonalizes  l^(ci;,A:),  i.e. 

then  W'(a;)  =  PW(u.’),  with  P  a  permutation  matrix,  also  diagonalizes  Rx(ct;,  k) 


Note  that  W(a;)  and  its  permutation  W'(a;)  will  lead  to  the  same  cost,  J(u})  at  each  fre¬ 
quency  u.  Therefore  the  least-squares  solution  may  consist  of  a  different  permutation  of 
W(a;)  at  each  frequency  bin,  causing  arbitrary  permutations  in  the  separated  signals  over 
the  spectrum.  Fig.  4  illustrates  the  permutation  problem.  Here  the  spectral  components 
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Sourcel 


- ^<0 

(a)  Consistent  permutations 


yiC®, 


Sourcel 


- ^<0 

(b)  Inconsistent  permutations 


Figure  4:  Permutations  across  the  spectrum  of  the  output  signals. 


while  uncorrelated  at  each  frequency,  are  not  properly  collected  at  the  same  channel  output, 
i.e.  mixed  in  the  frequency-domain. 

In  [3],  this  problem  is  solved  by  constraining  the  length  of  W(n)  to  Q  -C  T,  thereby 
forcing  the  solutions  to  be  continuous  or  “smooth”  in  the  frequency-domain.  This  constraint 
is  applied  by  starting  the  gradient  algorithm  with  an  initial  value  of  W(a>)  that  satisfies 
W(n)  =  0  for  Q  <  n  <  T  —  1,  and  then  following  the  constrained  gradient,  which  is  a 
projection  of  the  gradient  term  in  (3.13)  with  the  projection  operator 

G  =  FZF-^  (3.18) 

where  F  is  the  Discrete  Fourier  Transform  (DFT)  matrix  and  Z  is  the  truncation  matrix 
given  by 


Zij 


1,  %  —  jj  i  ^  Q 
<  0,  i  =  j,  i>Q 
.  0,  i  j. 


(3.19) 


The  projection  is  implemented  by  first  transforming  the  gradient  to  the  time-domain,  zeroing 
all  components  for  n  >  Q,  and  transforming  back  to  the  frequency-domain.  The  main 
problem  with  this  solution  to  the  permutation  problem  is  the  limited  size  of  un-mixing 
filters.  It  is  well-known  that  good  separation  performance  requires  long  filters  to  accurately 
model  the  inverse  of  the  acoustical  impulse  response.  In  [3],  it  was  proposed  that  a  large 
Q  (filter  size)  can  be  achieved  by  increasing  the  block  size  T.  However,  as  described  in  the 
previous  section,  T  cannot  exceed  10^  for  a  sampling  rate  of  8kHz.  Thus,  as  shown  in  [4], 
Q  is  limited  to  a  value  on  the  order  of  10^  in  order  to  minimize  the  permutation  problem. 
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However,  with  this  size  of  Q  there  is  insufficient  spectral  resolution  in  frequency-domain.  The 
permutation-inconsistency/spectraJ-resolution  tradeoff  is  pictured  in  Fig.  5  with  varying  Q 
and  fixed  T  =  2048.  As  one  can  see  the  best  Signal-to-Interference  Ratio  Improvement 
(SIRI),  formally  defined  in  the  next  section,  is  13dB  and  achieved  with  Q  =  400. 


Figure  5:  Average  SIRI  versus  un-mixing  filter  length  Q. 


3.4  A  Multiresolution  Approach  for  FDADF 

A  multiresolution  frequency-domain  (MRFD)  algorithm  was  proposed  by  Ikram  and  Morgan 
to  satisfy  the  conflicting  requirements  of  permutation  alignment  and  spectral  resolution  [4]. 
In  the  multistage  procedure,  the  FDSOS  method  of  (3.14)  is  used  with  increasing  values 
of  filter  length  Q  at  each  stage  of  the  algorithm.  Different  values  of  Q  imply  varying  the 
frequency-domain  resolution  of  W(a;)  at  each  stage,  hence  the  name  “multiresolution.”  The 
rationale  behind  such  an  approach  is  to  allow  the  permutations  to  align  themselves  using  a 
smaller  value  of  <5  <<  T  in  the  early  stages  of  the  algorithm.  Once  the  permutations  are 
aligned,  they  tend  to  retain  their  order  even  if  the  value  of  Q  is  increased.  The  increase  in 
the  value  of  Q,  however,  provides  the  desired  spectral  resolution  which  is  lacking  in  the  early 
stages. 

A  block  diagram  illustrating  the  blind  separation  of  speech  signals  using  the  MRFD 
algorithm  is  shown  in  Fig.  6.  The  mixing  stage  is  followed  by  S  un-mixing  stages,  each 
of  which  attempts  to  further  separate  the  sources  using  an  un-mixing  filter  with  increased 
spectral  resolution.  Note  that  the  separated  output  signals  from  each  un-mixing  stage  (after 
convergence)  at  each  stage  are  carried  over  as  the  initial  weights  for  the  following  stage.  To 
initiate  the  separation  procedure,  —  I2  is  used  in  the  first  stage.  Since  the  un-mixing 
of  the  speech  signals  is  carried  out  by  S  different  sets  of  weights,  the  SIRo  for  the  MRFD 
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algorithm  can  be  computed  using  the  overall  multichannel  response 

AH  =  W,(a;)W,_i{a;)...Wi(a;)HH 


(3.20) 


in  place  of  H(lj)  in  (3.24). 


1  2  si  s 


S 


Figure  6:  Block  diagram  of  Multiresolution  Frequency  Domain  blind  speech  separator. 

It  is  well-known  that  a  blind  estimate  of  W(u))  at  frequency  u/  can  be  best  obtained  up 
to  a  scale  and  a  permutation.  Therefore,  at  each  frequency  u,  the  separated  signal  Si(w) 
may  have  Si(tt>)  =  7Si(u))  or  Si(u;)  =  ^62(0^),  where  7  is  an  arbitrary  scaling  factor,  and 
the  second  possibility  arises  from  a  simple  interchange  of  the  rows  of  the  un-mixing  filter 
matrix  W(u;).  Consequently,  the  recovered  source  signal,  Sj  is  not  necessarily  a  consistent 
estimate  of  Sj  over  eill  frequencies.  What  occurs  is  that  each  “separated”  output  contains 
scaled,  spectral  energy  at  particular  frequencies  from  the  other  source  and  vice  versa.  In  [3], 
this  problem  was  solved  by  constraining  the  length  of  W(£j)  to  Q  <T,  thereby  forcing  the 
solution  to  be  smooth  or  continuous  in  the  frequency  domain.  However,  due  to  insufficient 
spectral  resolution,  actual  SIRJ  is  modest. 

In  [6],  the  authors  explore  SIRI  for  various  choices  of  Q.  As  mentioned  above,  for  Q  < 
T,  insufficient  spectral  resolution  limits  actual  SIRI.  Although  choosing  Q  ^  T  provides 
sufficient  spectral  resolution,  sufficient  continuity  of  the  un-mixing  filters  in  the  frequency 
domain  is  not  achieved.  A  sub-optimal  choice  of  Q,  however,  can  be  found  which  provides  a 
good  compromise  between  the  two  competing  objectives  although  performance  is  still  modest 
and  below  theoretical  SIRI  when  Q  f^T  and  permutations  of  W (tj)  are  perfectly  aligned. 

Experimental  results  indicate  good  performance  with  only  two  stages  where  Q2  is  ap¬ 
proximately  equal  to  the  reverberation  time  of  the  room  in  sample  periods  and  (as  a  rule  of 
thumb),  Qi  «  <32/4.  A  typical  reverberation  time  of  250ms  at  an  8kHz  sampling  rate  would 
give  <52  =  2000.  When  rounded  up  to  a  power  of  two  for  a  convenient  choice  of  block  length 
and  FFT  implementation,  we  have  Q2  =  2048. 

3.5  Modifications  to  MRFD 

In  this  section,  we  discuss  several  modifications  to  the  MRFD  algorithm  based  on  our  own 
independent  research  which  improve  computational  efficiency  and  performance.  Results 
with  using  the  modified  MRFD  algorithm  are  compared  to  the  original  algorithm  in  the 
next  section. 
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3.5.1  Real- Valued  Constraint  on  the  Un-mixing  Filters 

The  coefficients  of  the  acoustical  impulse  responses,  H(n)  will  always  be  real- valued  numbers 
and  thus  the  un-mixing  filter  coefficients,  W(n)  will  also  real- valued.  Exploiting  this  fact 
can  reduce  the  computational  complexity  by  half  since  from  the  the  Hermitian  property  of 
the  DFT,  we  have 


W(tj)  =  W*((5-w)  (3.21) 

for  ai  =  0, . . . ,  Q  —  1-  Therefore  we  need  only  solve  the  least  squares  problem  in  (3.11)  for 
2(Q  +  1)  unknown  parameters  given  K{T  +  l)/2  constraints  in  (3.9). 

3.5.2  Reinitialization  of  W(u;)  at  Each  Stage 

In  [4],  the  un-mixing  filters  W,(a;)  are  initialized  at  each  stage  with  the  converged  coefficients 
of  Wgf!\(tj)  from  the  previous  stage — a  technique  known  as  “cascaded  initialization”.  For 

the  first  stage,  we  set  Wi°^(t(;)  =  I.  However,  the  algorithm  is  not  directly  optimizing  the  un¬ 
mixing  filters,  but  instead  minimizing  the  CPSDs.  Also,  the  gradient  algorithm  may  reach 
a  local  minima  at  interim  stages.  Therefore,  reinitializing  W^“^(a;)  may  help  the  algorithm 
to  reach  different  local  minima  at  the  qth  stage  and  “save”  the  improvement  achieved  in 
current  stage.  In  order  to  evaluate  the  objective  function  J{uj)  at  each  iteration,  we  measure 
the  diagonalization  of  Ry(tJ,  k)  with 

O  =  -  (3.22) 

fc=l  ur=^0  i=l 

where  k)  is  the  ijth  element  of  the  PSD  matrix  of  output  defined  in  (3.9). 

Fig.  7  compares  the  learning  curves  of  the  cascaded  initializing  strategy  (two  dash  lines 
with  -i-)  [4]  and  those  of  the  reinitialization  approach  (two  solid  lines)  with  the  measurement 
in  (3.22)  for  the  two-stage  MRFD.  The  lower  lines  (solid  line  and  dashed  line  are  overlapped) 
are  the  learning  curves  for  the  first  stage,  and  the  upper  lines  are  the  learning  curves  for 
the  second  stage.  We  can  see  in  the  case  of  reinitialization  O  is  “saved”  at  the  end  of  the 
first  stage  and  is  continuously  improved  at  the  second  stage.  The  “save-attack”  behavior  is 
thought  to  be  due  to  the  update  of  Roc(w,  k)  at  the  beginning  of  each  stage.  A  quantitative 
comparison  of  SIRI  is  also  given  in  Table  3. 

3.5.3  Projection  of  the  Un-mixing  Filters  W(uj) 

As  explicitly  stated  in  [7],  the  projection  G  (3.18)  is  implemented  by  transforming  the 
gradient  into  the  time-domain,  zeroing  all  components  with  r  >  Q,  and  transforming  back 
to  the  frequency  domain.  Before  truncating  the  gradient  in  the  time-domain,  we  have  to  first 
normalize  it  by  J2k=:i  order  to  stabilize  the  algorithm.  For  simplicity  in  the 

Matlab  implementation  of  the  MRFD  algorithm,  we  instead  apply  the  projection  operator 
P  directly  to  the  un-mixing  filters  W(a>)  after  each  update,  Our  results  show  that  both 
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Figure  7:  Comparison  of  two  initializing  strategies. 


projection  methods  achieve  the  same  SIRI  performance  due  to  the  linearity  property  of  the 
DFT. 

We  summarize  the  MRFD  algorithm  with  the  above  modifications  in  Fig.  8. 


3.6  Performance  Metrics 

Researchers  evaluate  the  performance  of  blind  speech  separation  and  speech  enhancement 
algorithms  in  various  ways.  These  include  speech  recognition  rates  [3];  plots  of  separated 
signals  [8],  [9],  [10];  and  analysis  of  the  improvement  to  the  Signal-to-Interference  Ratio  after 
processing  [3],  [4].  In  this  work,  we  have  exclusively  measured  algorithm  performance  using 
the  SIRI  since  this  is  most  easily  computed  measure  and  allows  easy  comparison  with  other 
algorithms.  For  further  information  on  how  improvements  to  the  SIR  impacts  automatic 
speech  recognition  (ASR)  see  [llj. 

The  average  input  signal-to-interference  ratio  (SIRj)  is  defined  as  the  ratio  of  the  total 
signal  power  obtained  via  direct  channels  (/iii,/i22)  to  the  total  signal  power  obtained  via 
cross  channels  (/121,  ^12),  i.e. 

SlRi  =  - .  (3.23) 

EE  E 

01=0  1=1  j=l 

Replacing  H(a>)  by  W(a;)H(u>)  similarly  defines  the  average  post-processed,  or  output  signal- 
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x(t<;,  k)  =  Y,t=o  'U){T)x{ml3T  +  Short-Time  Fourier  Transform 

For  9  =  1 :  5G 

Rx(a;,  A:)  =  I2m=o  x(u;,  Mk  +  m)'x.^{uj,  Mk  -{■  m);  Estimate  PSD 
W{u,k)  =  OS[R^{  w,  fc)j ;  Build  cost  matrix 

"  SL'ifellir  Normalize  step  size 
For  1  =  1 :  L 

C  =  Off[V(a;,  /;:)W^*^(a;)l^(c4;,  k)]\  Build  correction  matrix 

—  /i(a>)C;  Update  W{u))  at  each  frequency 
—  tu)  =  Apply  symmetric  constraint 

G  =  FZgF“^;  Projection  operator 

Apply  length  constraint  on  un-mixing  filters 

End 

(cu,  m)  =  W (w,  m);  Current  stage  separation 

End 

y(r,  m)  =  p 

y(n)  =  Em=o  y(^  “  mPT,m)',  Synthesize  time-domain  signals 


Figure  8:  Modified  MRFD  Algorithm  for  speech  separation, 
to-interference  ratio  (SIRo). 

SIRo  =  -  (3.24) 

ES  E  ||W(i.')H(*.,)lyf|sj{w)|'‘ 

CJ=0  i=l 

The  objective  of  blind  speech  separation  algorithms  is  to  obtain  a  high  SIR  improvement 
(SIRI)  given  by  the  ratio  of  (3.23)  to  (3.25) 

SIR!  =  SIRo/SIRi.  (3.25) 

In  [12],  Schobben  et  al  divided  the  suite  of  test  cases  into  two  main  categories: 

1.  Controllable  Synthetic  Separation  Problems 

In  this  case,  the  channel  impulse  responses  and  source  signals  are  known  a  prior  due  to 
the  simulated  approach  to  testing.  In  our  experiments,  we  simulate  the  impulse  responses 
from  an  acoustic  source  to  multiple  microphones  using  the  image  method  described  in  [13]. 
The  simulated  room  is  shown  in  Fig.  9.  The  image  method  requires  the  dimensions  of 
the  virtual  room;  coordinates  of  each  source,  (a;^.,  coordinates  of  each  microphone, 

{xmi,ymi,  Zmi)',  and  reflection  coefficients  of  the  six  walls,  pi,  I  =  1, ...  ,6.  Parameters  listed 
in  Table  1  are  used  in  our  simulations  except  otherwise  specified.  For  testing  purposes, 
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Figure  9;  Room  geometry  (coordinate  units  in  meter). 


speech  mixtures  were  digitally  synthesized  according  to  (2.1)  using  speech  sources  drawn 
from  the  Linguistic  Data  Consortium’s  (LDC’s)  Texas  Instruments,  Massachusetts  Institute 
of  Technology  (TIMIT)  speech  corpus  and  filtered  with  impulses  simulated  with  the  image 
method.  This  speech  corpus  consists  of  short  utterances  from  a  large  sample  of  American 
English  speakers.  For  more  information  on  this  speech  database  see  [14]. 


Table  1:  Simulated  room  parameters. 


Parameters 

Values  (in  meters) 

Source  1  coordinates 

(2.13,  0.91,  1.52) 

Source  2  coordinates 

(3.35,  0.91,  1.52) 

Microphone  1  coordinates 

(2.23,  2.74,  1.68) 

Microphone  2  coordinates 

(2.83,  2.74,  1.68) 

Room  dimensions 

(5.06,  3.41,  2.44) 

Reflection  coefficient,  p 

(0.3,  0.3,  0.3,  0.3,  0.3,  0.3) 

2. Real-World  Recording  Problems 

Here,  original  source  signals  and  impulse  responses  are  not  available.  However,  we  can  still 
estimate  direct-channel  signal  power  (numerator)  and  cross-channel  signal  power  (denom¬ 
inator)  in  (3.23)  by  using  alternating  source  signals  [3].  We  estimate  the  contributions  of 
source  i  while  source  i  is  “on”  and  all  other  sources  are  “off”.  For  a  two-channel  mix.  Fig.  10 
shows  this  approach.  In  this  case,  we  have  four  mixed  signals  and  four  separated  signals, 
including  the  direct-channel  contributions  (arn)  3^22, 1/11,1/22)  and  the  cross-channel  contribu¬ 
tions  (xi2,a:2i,  j/i2,2/2i)-  We  can  measure  the  SIRI  as  in  (3.25)  where,  in  this  approach,  we 
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3.7  Modified  MRFD  Implementation 

The  speech  separation  code  contains  eight  different  .c  files  and  one  .h  (header)  file.  The  first 
and  main  file  is  the  speech.c.  The  program  needs  the  standard  stdio.h,  stdlib.h  and  math.h 
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to  compile  correctly.  Several  of  the  lower-level  math  and  signal  processing  routines  can  be 
found  in  [15]. 


•  speech. c  -  Contains  the  mainO  function  and  is  where  variable  initialization,  memory 
allocation,  and  calls  to  other  functions  take  place.  This  is  the  basis  from  where  the 
algorithm  is  processed. 

•  complex. c  -  This  file  contains  all  the  math  functions  for  complex  arithmetic  as  well 
as  defines  the  structure  for  complex  numbers.  The  functions  that  are  defined  are  as 
follows. 


cmplxO  -  Converts  real  numbers  to  complex 

conjgO  -  Conjugate  of  complex  number 

caddO  -  Complex  addition 

csubO  -  Complex  subtraction 

cmulO  -  Complex  multiplication 

rmulO  -  Multiplication  by  real 

cdivO  -  Complex  division 

rdivO  -  Division  by  real 

realO  -  Real  part  of  complex  number 

aimagO  -  Imaginary  part  of  complex  number 

cexpO  -  Complex  Exponential 


•  cmplx .  h  -  Defines  of  all  the  complex  functions 

•  f  f  t .  c  -  In-place  decimation-in-time  (DIT)  Fast  Fourier  Transform  (FFT) 

•  if  ft .  c  -  Inverse  Fast  Fourier  Transform  (IFFT) 

•  bitrev .  c  -  Does  bit  reversing  of  a  B-bit  integer,  used  by  the  Fourier  'Dansforms 

•  dftmerge.c  -  Discrete  Fourier  TVansform  (DFT)  for  radix-2  DIT  FFT 

•  shuffle.c  -  In-place  shufiling  (bit-reversal)  of  a  complex  array 


•  swap .  c  -  Swap  two  complex  numbers 


The  change  of  the  C  implementation  from  a  non-real-time  model  to  a  real-time  model 
would  require  a  few  foreseeable  changes  in  the  code.  There  are  a  couple  loops  that  operate 
on  the  entire  signal,  the  normalization  for  example.  The  parameters  for  these  loops  will  need 
to  be  modified  for  a  real-time  implementation.  Some  loops  will  need  to  be  eliminated  as  well 
and  replaced  with  control  code  for  which  block  of  samples  it  is  operating  on.  The  control 
mechanisms  for  the  loading  and  operating  of  samples  in  buffer  blocks  will  also  need  to  be 
added.  For  a  real-time  implementation  the  code  would  most  likely  need  to  be  optimized  as 
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Table  2;  Execution  times  for  processing  50s  long  speech  signals  using  C-implementation  of 
Modified  MRFD.  _ 


real 

0m20.551s 

user 

0ml9.980s 

sys 

0m0.420s 

well,  there  are  many  places  in  the  non-real-time  model  where  this  can  be  done.  In  general 
the  core  algorithm  in  the  C  code  implementation  is  sound  and  would  most  likely  only  require 
other  code  to  be  wrapped  around  it. 

Benchmarks  for  processing  two  50s  long  speech  signals  (8kHz  sample  rate)  are  listed  in 
Table  2.  Processing  times  were  determined  with  the  following  instruction 

time  ./speech  mixl.txt  mix2.txt 

The  test  PC  has  dual  1.2GHz  AMD  Athlon  MP  Processors,  1GB  of  DDR  RAM,  Red  Hat 
Linux  7.2  (Kernel  2.4.16).  The  codes  were  compiled  under  gcc-2.96  with  glibc-2.2.4.  We 
note  that  the  program  is  not  multithreaded  so  it  has  no  benefit  of  a  dual  processor  machine. 
Code  is  compiled  with  Athlon  optimizations. 

3.8  NMSU  BSS  Speech  Corpus 

In  this  work,  we  developed  a  speech  corpus  in  order  to  standardize  evaluation  and  testing  of 
this  and  future  speech  separation/speech  enhancement  algorithms.  This  corpus  is  available 
(free  of  charge)  over  the  Internet  at 

http :  //www .  ece .  nmsu .  edu/"pdeleon/BSS/index .  html 

The  corpus  contains  a  collection  of  impulse  responses  and  speech  recordings  from  various 
environments  and  is  more  extensive  than  other  available  databases  specifically  targeted  at 
BSS.  Also  included  on  the  website  are  more  detailed  descriptions  of  recording  procedure, 
environment,  speakers,  etc —  Figs.  11  and  12  show  sample  screen  shots  of  the  web  pages  for 
the  NMSU  BSS  speech  corpus.  Here  we  briefly  described  collection  procedure  and  available 
recordings. 

3.8.1  Impulse  Responses  for  Various  Acoustic  Environments 

In  order  to  provide  researchers  the  ability  to  synthesize  signals  in  various  acoustic  environ¬ 
ments,  impulse  responses  were  measured  for  various  environments.  For  each  environment, 
a  pair  of  direct  channel  impulse  responses  (left  source  to  left  microphone  and  right  source 
to  right  microphone)  and  a  pair  of  cross  channel  impulse  responses  (left  source  to  right  mi¬ 
crophone  and  left  source  to  right  microphone)  were  measured  as  in  Fig.  13.  The  impulse 
responses  are  only  accurate  to  within  a  delay  since  the  stimulus  used  to  excite  the  room  re¬ 
sponse  was  not  synchronized  to  the  recorder.  In  order  to  simplify,  all  responses  begin  50ms 
before  the  main  peak  (direct  path)  and  extend  a  total  of  3sec.  Typically,  the  cross  channel 
responses  have  a  longer  delay  than  the  direct  channel  since  the  path  is  longer. 

Recording  Equipment  Used  in  All  Impulse  Response  Measurements 
The  following  recording  equipment  was  used  in  collection  of  impulse  responses. 
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IntrcNlactjon  to  the  Bliod  Source  (Speech)  Sqtaration  Corpus 

The  Blind  Source  Separ wion  (ESS;  corpi*  U  «  collection  of  digits'  recorjinfs  of  mixturei  of  epeech  signals  arri 
backgromi  rote  fn  various  accustic  tn'/ironiEefjU  These  reiloyorl j  rewrdings  can  fce  used  to  evaluate  various 
ESS  and  other  speech  enhancement  algorhhms  using  ananded  test  set  In  addition  impute  responses  for  various 
a»uni:  envL'-onments  are  available  for  use  tn  corttrclleJ  simulation  of  different  "mixihg  environtnentr  ‘ 

The  recordings  were  nade  under  a  gran:  from  the  Air  Force  Research  L  abxatories  and  arc  freely  available  for 
dov.’nload. 


tHmt !  CaniottlmtiraatiBti  ITtjctjMl  IPtr>:>il  i 

3  i3  Qi*  K3  <a^  twr*/d5*r»,;s;  SiS-gj* 

Figure  11:  Main  web  page  for  the  NMSU  Blind  Speech  Corpus. 

•  (1)  Dell  Dimension  XPS  R400  PC  (Pentium  II  @  400MHz,  128MB  RAM,  40GB  disk, 
Windows  98)  (www. dell.com) 

•  (1)  Echo  Layla  20-bit  multitrack  recording  system  (www.) 

•  (2)  Shure  omni-directional  microphone  Model  VP64A2  (wwwr.shure.com) 

•  (2)  Applied  Research  And  Technology  (ART)  Professional  processor  series  tube  preamp 
(www.art.com) 

•  (2)  Balanced  (XLR)  microphone  cables  from  microphones  to  preamps 

•  (2)  Balanced  (XLR)  microphone  cables  from  preamps  to  Layla 
Recording/ Editing  Software 

The  following  software  was  used  in  collection  of  impulse  responses. 

•  Syntrillium  Software  Corporation’s  Cool  Edit  Pro  vl.2  for  recording  and  (www.syntrillium.com) 

•  Aurora  Convolve  with  Clipboard  and  Generate  Log  Sweep  plug-ins  for  Cool  Eldit  pro 
for  impulse  response  measurements  (www.ramsete.com/aurora/) 

•  SoundApp  PPC  v2.7.3  for  sample  rate  conversion  to  16kHz  from  48kHz  (www.download.com) 


I  tothtESSC 
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Figure  12:  Sample  web  page  for  the  the  lecture  hall  recording  environment. 
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Figure  13:  Direct  and  cross  channel  impulse  responses. 


Room  Impulse  Response  Measurement 

The  Aurora  plug-in  for  Cool  Edit  Pro  utilizes  a  Chirp  (sinusoidal  sweep)  stimulus  in  order 
to  compute  the  room  impulse  response  which  is  sampled  at  48kHz.  Impulse  responses  are 
also  provided  at  a  16kHz  sample  rate. 
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Anechoic  Chamber 

The  anechoic  chamber  is  located  on  the  New  Mexico  State  University  campus  in  the  Klipsch 
School  of  Electrical  and  Computer  Engineering  (Thomas  and  Brown,  Room  311).  Fig.  14 
shows  the  schematic  of  recording  setup  with  dimensions.  The  available  impulse  response  files 
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Figure  14:  Schematic  of  Anechoic  Chamber  recording  setup. 


are: 


Anechoic_ImpResp_16k_LL .  wav 
Anechoic_ImpResp_16k_RL .  wav 
Anechoic_ImpResp_48k_LL .  wav 
Anechoic_IinpResp_48k_RL .  wav 


Anechoic_ImpResp_16k_LR .  wav 
Anechoic_ImpResp_16k_RR.wav 
Anechoic_ImpResp_48k_LR .  wav 
Anechoic_ImpResp_48k_RR .  wav 


Bathroom 

The  men’s  bathroom  is  located  on  the  New  Mexico  State  University  campus  in  the  Klipsch 
School  of  Electrical  and  Computer  Engineering  (Goddard  Hall,  Room  101).  Fig.  15  shows 
the  schematic  of  recording  setup  with  dimensions.  The  available  impulse  response  files  are: 


Bathroom_InipResp_16k_LL .  wav 
Bathrooni_InipResp_16k_RL .  wav 
Bathroom.  ImpResp_48k_LL .  wav 
Bathroom.  ImpResp_48k_RL .  wav 


Bathroom.  ImpResp.lSk.LR .  wav 
Bathroom.  ImpResp_16k_RR .  wav 
Bathroom.  ImpResp.48k_LR .  wav 
Bathroom.  ImpResp_48k.RR .  wav 


Large  Room  (Conference  Room) 

The  large  room  is  located  on  the  New  Mexico  State  University  campus  in  the  Klipsch  School 
of  Electrical  and  Computer  Engineering  (Goddard  Hall,  Room  167).  Fig.  16  shows  the 
schematic  of  recording  setup  with  dimensions.  The  available  impulse  response  files  are 
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Figure  15;  Schematic  of  Bathroom  recording  setup. 
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Figure  16;  Schematic  of  Large  Room  recording  setup. 


LEirgeRoom_ImpResp_16k_LL .  wav 
LargeRoom_ImpResp_16k_RL .  wav 
LargeRooin_IjnpResp_48k^LL .  wav 
LargeRoom_IinpResp_48k_RL .  wav 

Lecture  Hall 


LargeRoom.ImpResp. 1 6k_ 
LargeRoom_ ImpResp_ 16k_ 
LargeRo  om_ ImpRe sp_4 8k_ 
LargeRo  om_  InjpRe  sp_4  8k_ 
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The  lecture  hall  is  located  on  the  New  Mexico  State  University  campus  in  the  Klipsch  School 
of  Electrical  and  Computer  Engineering  (Thomas  and  Brown,  Room  104).  Fig.  17  shows  the 
schematic  of  recording  setup  with  dimensions.  The  available  impulse  response  files  are 

(.,) 


00 

Figure  17:  Schematic  of  Lecture  Hall  recording  setup. 

Lecture_ImpResp_  1 6k_LL .  vav  Lectur  e_  ImpResp.  1 6k_LR .  wav 

Lecture_ImpResp_16k_RL.wav  Lecture_ImpResp_16k_RR.wav 

Lecture_ImpResp_48k_LL.wav  Lecture_ImpResp_48k_LR.wav 

Lecture_ImpResp_48k_RL .  wav  Lecture_ImpResp_48k_RR .  wav 

Small  Room  (Office) 

The  Small  Room  is  located  on  the  New  Mexico  State  University  campus  in  the  Klipsch 
School  of  Electrical  and  Computer  Engineering  (Goddard  Hall,  Room  171).  Fig.  18  shows 
the  schematic  of  recording  setup  with  dimensions.  The  available  impulse  response  files  are 

SmallRooBi_ImpResp_16k_LL .  wav  SmallRoom_ImpResp_16k_LR .  wav 

SmallRoom_ImpResp_16k_RL.wav  SmallRooin_ImpResp_16k_RR.  wav 
SmallRoom_InipResp_48k_LL .  wav  SmallRoom_ImpResp_48k_LR .  wav 

Sinai  lRooni_ImpResp_48k_RL .  wav  SmallRooin_InipResp_48k_RR.  wav 

Stairwell 

The  stairwell  is  located  on  the  New  Mexico  State  University  campus  in  the  Klipsch  School 
of  Electrical  and  Computer  Engineering  on  the  second  floor.  Fig.  19  shows  the  schematic  of 
recording  setup  with  dimensions.  The  available  impulse  response  files  are 

Stairwell.  ImpResp.  16k_LL .  wav  Stairwell_ImpResp_16k_LR .  wav 

Stairwell_ImpResp_16k_RL .  wav  Stairwell_IinpResp_16k_RR.wav 

Stairwell_IinpResp_48k_LL .  wav  Stairwell_ImpResp_48k_LR .  wav 

Stairwell_ImpResp_48k_RL .  wav  Stairwell_ImpResp_48k_RR .  wav 
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Figure  18;  Schematic  of  Small  Room  recording  setup. 
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Figure  19:  Schematic  of  Stairwell  recording  setup. 


Study  Lounge 

The  study  lounge  is  located  on  the  New  Mexico  State  University  campus  in  the  Klipsch 
School  of  Electrical  and  Computer  Engineering  (Thomas  and  Brown,  Room  102).  Fig.  20 
shows  the  schematic  of  recording  setup  with  dimensions.  The  available  impulse  response  files 
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Figure  20:  Schematic  of  Study  Lounge  recording  setup. 


are 


Study. ImpResp_16k_LL . wav  Study. ImpResp_16k_LR . wav 
Study. ImpRe  sp_ 1 6k_RL . wav  Study. ImpRe sp_ 16k_RR . wav 
Study_IinpResp_48k_LL . wav  Study. ImpResp_48k_LR . wav 
Study. ImpRe  sp.48k.RL . wav  Study. ImpResp.48k_RR . wav 
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4  Results  and  Discussion 

In  this  section,  we  report  results  of  our  research.  These  results  include 

•  Optimal  parameter  selection  for  the  modified  MRFD  algorithm  and  resulting  SIRI 
results 

•  Algorithm  performance  in  terms  of  various  Simulated  room  sizes  and  reverberation 
times 

•  Algorithm  performance  when  mixture  signals  have  additive  noise  present 

•  Algorithm  performance  and  processing  times  for  short  mixture  signal  lengths  and  im¬ 
pact  on  real-time  operation 

•  Performance  of  separating  (enhancing)  one  speech  signal  from  a  background  of  additive 
noise. 

4.1  Selection  of  Optimal  Parameters  for  the  MRFD  Algorithm 

We  a)nsider  the  selection  of  optimal  parameters  for  the  modified  MRFD  algorithm.  The  four 
most  important  parameters  that  affect  the  separation  performance  are  the:  block  length  (or 
maximum  filter  length),  T;  length  of  the  un-mixing  filter,  Wij  at  each  stage  given  by  the  set 
Q  =  {Qi, . . . ,  Qsg}',  adaptive  step  size,  p.;  and  number  of  iterations,  L.  Generally,  the  longer 
the  un-mixing  filter  is,  ultimately  the  better  the  SIRI.  However,  the  permutation  problem 
constrains  Q  <T.  Generally,  we  choose  at  the  last  stage,  a  maximum  un-mixing  filter  length 
equal  to  the  block  length,  Qsg  =  T.  The  parameter  set  for  the  multistage  method  includes 
the  number  of  stages  {SG)  and  the  filter  length  Qi  at  each  stage. 

Careful  selection  of  the  step  size  is  necessary  to  obtain  good  performance  from  gradient- 
based  adaptive  algorithms  used  in  decorrelation  and  speech  separation  tasks  [16].  In  this 
context,  the  step  size  controls  the  convergence  rate  and  the  misadjustment,  which  form 
a  tradeoff  pair.  Fast  convergence,  i.e.  convergence  with  smaller  L,  requires  a  bigger  step 
size,  which  may  lead  to  large  misadjustment.  On  the  contrary,  we  can  achieve  much  better 
SIRI  using  a  smaller  step  size  but  at  a  cost  of  decreased  convergence  speed.  The  number 
of  iterations,  L,  is  one  of  the  main  computational  bottlenecks  in  the  MRFD  algorithm. 
We  ideally  want  to  choose  the  smallest  value  L  in  order  to  achieve  fast  separation  while 
maintaining  acceptable  SIRI.  There  are  also  other  parameters  that  play  less  important  roles 
in  the  MRFD  algorithm.  These  are 

•  Initial  value  of  un-mixing  filter  matrix,  W(°)(a;)  —  An  adaptive  system  is  stable  in  the 
sense  that  it  can  be  driven  to  the  steady  state  without  regard  to  the  choice  of  initial 
status.  However,  the  objective  function,  J{uj)  has  local  minima,  therefore  we  set  the 
initial  un-mixing  matrices  W^°^(a;)  to  be  identity  matrices  over  frequencies  as  is  typical 
in  the  literature. 

•  Number  of  iterations,  K  —  Building  an  over-determined  least  squares  problem  requires 
the  number  of  superblocks,  A  >  4  in  order  to  generate  enough  equations.  Note  that 
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larger  K  can  guarantee  better  separation  performance  due  to  better  (longer-term) 
statistical  measures  of  the  mixtures  as  shown  in  Fig.3.  However,  K  also  depends  on 
N,  the  length  of  the  input  mixtures.  For  a  fixed  N,  K  has  to  be  properly  chosen  to 
give  good  average  estimates  of  the  PSD  matrix,  Rx(a;,  k).  In  most  of  our  experiments, 
the  input  signals  have  a  duration  of  7  —  14  sec,  accordingly,  we  choose  A  <  K  <1 
(assuming  an  8kHz  sample  rate). 

•  Overlap  factor,  yd  —  The  required  rate  of  the  properly  sampled  short-time  spectrum 
representation  needs  to  be  2  -  4  times  higher  than  for  the  equivalent  TD  representation 
of  the  speech  signal  itself  [5].  Thus,  we  set  the  overlap  factor  /3  =  0.5  for  Hamming 
window. 

•  Total  number  of  blocks,  U,  and  number  of  blocks  in  each  superblock,  M  —  These  two 
parameters  are  determined  by  the  selection  of  the  other  parameters  N,  /3,  T,  and  K 
with  U  =  N/{/3T)  and  M  =  U/K. 

4.1.1  Block  Length  T 

An  optimal  T  is  desired  to  satisfy  the  short-time  statistical  assumptions  regarding  the  speech 
signals  while  maintaining  high  spectral  resolution  of  the  un-mixing  filters  (in  our  setting 
Qsg  —  T).  In  order  to  measure  the  SIRI  with  varying  block  length  under  the  same  mixing 
process,  we  make  use  of  metrics  given  in  (3.26)  and  (3.27)  for  real-world  recordings  because 
the  metrics  given  in  (3.23)  and  (3.24)  require  P  =  T,  i.e.  varying  mixing  filter  length  at 
the  same  time.  In  our  experiment,  we  randomly  choose  20  pairs  of  speech  signals  from  the 
TIMIT  speech  corpus.  Each  pair  is  mixed  through  the  simulated  room  impulses,  shown  in 
Fig.  21,  with  mixing  filter  length  2048.  We  fix  other  parameters  as  p  =  1.0,  L  =  100,  and 
update  the  multistage  parameter  set  according  to  Q  =  (r/4,  T]^.  Fig.  22  shows  the  SIRI  in 
dB  with  T  varying  over  the  range  128  to  16384.  We  find  the  optimal  block  length  T  =  2048. 


4.1.2  Multistage  Parameter  Set  Q 

Here,  we  fix  the  block  length  T  =  2048,  the  adaptive  step  size  =  1.0,  and  randomly  choose 
13  combinations  of  SG  and  Qg.  The  average  SIRI  of  20  pairs  of  speech  mixtures  with  the 
sources  randomly  selected  from  TIMIT  is  measured  for  each  case  and  listed  in  Table.  3. 
There  are  two  sets  of  experimental  results  for  each  case:  the  result  in  the  first  row  of  each 
cell  are  the  average  SIRIs  of  MRFD,  while  in  the  second  row  are  average  SIRIs  of  modified 
MRFD  listed  in  italic  font.  One  can  see  that  the  modifications  tc  MRFD  result  in  separation 
performance  that  is  l-2dB  better  than  that  of  the  original  MR.FD  at  every  stage  for  each 
case.  Furthermore,  the  optimal  performance  is  obtained  in  the  two-stage  case  with  Qi  =  500, 
and  Q2  =  2048  through  modified  MRFD  algorithm.  Again  we  note  these  modifications  not 
only  improve  SIRI  but  require  no  additional  complexity  (modification  #2)  and  reduce  cost 
(modification  #1). 
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Table  3:  Average  SIR!  for  different 


Stage  2 

Stage  3 

Qi 

Qa 

SIRI 

Qs 

SIRI 

B!SI 

(dB) 

(dB) 

300 

15.83 

nsi 

im 

14.82 

500 

14.76 

mSMil 

iwapi 

iWiiifcl 

1024 

14.66 

■ 

mil 

17.31 

ms 

500 

16.12 

1024 

14.49 

17.70 

15.97 

300 

15.83 

2048 

18.27 

17.31 

19.49 

500 

16.12 

2048 

18.41 

17.70 

liteJfefe'i 

700 

16.07 

2048 

lligM 

17.38 

300 

15.84 

500 

|i|jy 

1024 

14.28 

17.31 

16.50 

16.01 

500 

16.12 

1200 

14.41 

2048 

18.24 

17.70 

15.72 

19.44 

500 

16.12 

800 

14.57 

1000 

14.16 

17.70 

16.09 

15.61 

256 

15.35 

800 

14,60 

1024 

14.22 

16.50 

15.88 

15.48 

300 

15.84 

500 

14.79 

700 

14.38 

17.31 

mmfim 

16.14 

500 

16.12 

800 

14.57 

1000 

14.16 

17.70 

16.09 

15.61 
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Figure  21:  Simulated  room  impulse  responses. 


Figure  22;  Average  SIRI  versus  block  length  T. 


4.1.3  Adaptive  Step  Size  p. 

Fixing  T  =  2048,  and  Q  =  [500  2048],  we  examine  the  SIRI  with  varied  adaptive  step  size 
p  in  the  range  from  10“®  to  2.0.  The  results  are  shown  in  Fig.  23.  As  one  can  see  that  the 
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optimal  adaptive  step  size  is  achieved  at  /t  =  1;0.  Also,  from  Fig.  24,  it  is  easy  to  see  that  if 
fi  is  too  small,  the  adaptive  algorithm  takes  too  long  to  converge  as  shown  in  the  first  four 
plots.  On  the  other  hand,  if  fi  is  too  large  (>  1.0),  the  algorithm  may  become  unstable.  This 
leads  to  failure  as  can  be  seen  in  the  last  two  plots  of  Fig.  24. 


Figure  23:  Average  SIRI  versus  adaptive  step  size  fi. 


4.1.4  Number  of  Iterations  L 

It  is  well-known  that  for  the  MRFD  algorithm,  more  filter  update  iterations  will  lead  to  a 
higher  SIRI.  However,  a  large  L  requires  more  computational  time,  which  limits  tracking 
a  time- varying  environment  and  real-time  applications.  Fig.  25. (a)  shows  the  SIRI  plot 
mth  increasing  L  and  other  parameters  fixed  as  T  =  2048,  Q  =  [256  2048],  and  fi  =  1.0. 
Fig.  25b  shows  the  processing  time  versus  the  number  of  iterations  L  (Matlab  6.0,  Dell 
Precision  Workstation  330  with  1.8GHz  P4  and  IGB  RAM).  In  the  simulations  we  are 
mainly  interested  in  the  SIRI  performance  with  acceptable  processing  time,  we  therefore 
choose  L  =  100. 

4.1.5  Summary 

Optimal  parameters  for  the  modified  MRFD  that  yield  good  separation  performance  at  rea¬ 
sonable  computational  cost  are  shown  in  Table  4.  Unless  otherwise  noted,  these  parameters 
are  used  in  all  the  following  experiments. 
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Figure  24:  SIRI  versus  iteration  index  with  different  step  size  pL. 


Table  4:  Simulation  parameters 


Parameter 

Value 

Overlap  factor,  /? 

0.5 

Number  of  superblocks,  K 

6 

Number  of  iterations,  L 

100 

Normalized  step  size,  p 

1.0 

Multistage  parameter  set,  Q 

[500,  2048]'-*' 

Block  size,  T 

2048 

Initial  un-mixing  filter  matrix  for  all  stages, 

1  0 

0  1 

4.2  Results  on  the  Effect  of  Room  Size  and  Reverberation  Time 

We  simulate  the  room  impulse  responses  hji{n)  for  various  reflection  coeflficients,  p  which 
determines  the  “liveliness”  of  the  acoustic  environment.  For  simplicity,  we  set  p  the  same 
for  all  six  walls  in  the  simulated  room.  The  average  SIRI  using  the  modified  MRFD  of  20 
random  pairs  of  speech  mixtures  are  measured  and  pictured  in  Fig.  26.  As  expected,  higher 


30 


Figure  25:  The  tradeoff  between  separation  performance  and  processing  time. 
SIRIs  are  achieved  for  the  room  with  smaller  reflection  coefficient,  i.e.  less  reverberation. 


Figure  26:  SIRI  performance  in  simulated  room  with  different  reverberation. 
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4.2.1  Various  Room  Sizes 

In  this  section,  the  room  geometry  in  Fig.  9  varies  with  a  in  the  range  of  4m  to  30m 
according  to  Table  5.  The  reflection  coefficient  is  p  =  0.3  for  each  of  six  walls.  Fig.  27 
shows  the  resulting  SIRI  obtained  by  varying  room  size  parameter  a.  As  one  can  see,  the 
separation  performance  improves  with  increasing  room  size,  which  is  also  due  to  the  decreased 
reverberation  as  in  the  case  of  varying  reflection  coefficients. 

Table  5:  Simulated  room  parameters. 


Parameters 

Values 

Sourcel  Coordinates 

(f-l,a-2,1.5) 

Source2  Coordinates 

(§  +  l,a-2,1.5) 

Microphone  1  coordinates 

(f-l,a,1.7) 

Microphone  2  coordinates 

(f  + 1,0. 1.7) 

Room  Dimensions 

(a,  Cl,  2) 

Figure  27:  SIRI  performance  in  simulated  room  with  varying  room  size. 


4.3  Results  with  Two  Speakers  and  Additive  Background  Noise 

If  a  noise  source  is  present  in  addition  to  the  two  speech  sources,  separation  of  the  speech 
signals  will  still  proceed  but  this  noise  will  be  present  in  each  separated  signal.  In  addition. 
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SIRI  gracefully  degrades  as  the  SNR  decreases  (noise  power  increases).  Fig.  28  illustrates 
the  experimental  setup  and  Fig.  29  shows  the  SIRI  performance  as  a  function  of  SNR.  This 
finding  and  motivates  the  requirement  for  additional  denoising  of  correlated  noise  sources 
from  the  speech  signals  in  order  to  obtain  accurate  ASR. 


Figure  28:  Block  diagram  of  BSS  with  background  AWGN. 


Figure  29:  SIRI  performance  in  various  levels  of  background  noise. 

Prom  Fig.  29,  we  can  see  that  3  —  lOdB  SIRI  can  be  achieved  when  the  mixtures  are 
buried  in  background  noise  (-lOdB  <  SNR  <  OdB)  as  in  a  vehicle  or  aircraft  or  other  noisy 
environment.  When  the  average  power  ratio  of  the  mixed  speech  signals  to  the  additive  noise 
exceeds  lOdB,  the  MMRFD  algorithm  yields  a  15.5dB  SIRI  in  an  average  sense.  We  can 
therefore  conclude  that  MMRFD  is  robust  under  background,  wideband  noise  conditions. 
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4.4  Results  with  Short  Mixture  Signals 

Up  to  now,  all  simulations  have  used  mixing  filters  with  fixed  impulse  responses.  However, 
the  channel  impulse  responses  may  change  dramatically  even  for  the  speaker  just  turning 
his/her  head  10  —  20  degrees,  or  leaning  backward  a  couple  of  inches  [17].  In  practical 
environments,  the  algorithm  must  be  able  to  rapidly  adapt  in  these  dynamic  situations. 

We  assume  that  during  the  time  that  multiple,  second  order  statistics  are  estimated,  the 
mixing  filters  remain  approximately  the  same  [10].  The  question  we  ask,  however,  is  can  we 
collect  enough  samples  for  adaption  before  the  change  of  mixing  filters  even  under  a  slowly 
varying  acoustical  environment?  Or,  equivalently  what  is  the  amount  of  data  needed  by  the 
MRFD  algorithm  for  successful  signal  separation?  In  order  to  answer  these  questions,  we 
conduct  an  experiment  to  test  the  potential  of  an  online  application  of  the  MRFD  algorithm. 
Fig.  30  shows  the  SIRI  for  various  input  signal  lengths  with  varying  number  of  iterations  L. 
As  expected,  we  see  that  with  increasing  signal  length,  the  performance  of  MRFD  increases. 
Quantitatively,  the  performance  of  MRFD  approaches  about  14dB  SIRI  with  just  2sec  of 
input  at  sampling  rate  of  16kHz.  We  note  that  this  is  the  upper  bound  of  SIRI  for  most 
other  off-line  algorithms  with  input  signal  lengths  of  30  to  50  seconds  [18],  [19],  [20]. 


Figure  30:  SIR!  versus  length  of  input  mixtures. 

Even  if  there  is  enough  data  available,  another  question  is  posed  in  [17]:  Is  it  possible  to 
perform  the  necessary  computation  in  real  time?  Our  computational  benchmarks  indicate 
30sec  of  input  speech  mixtures  recorded  in  a  real  room  can  be  successfully  separated  in  75 
sec  with  more  than  20dB  enhancement  (Matlab  6.0,  DeU  Precision  330  with  1.8GHz  P4, 
1GB  RAM).  Fig.  31  shows  the  processing  time  versus  the  length  of  input  for  three  different 
numbers  of  adaptive  iterations.  We  can  see  from  Fig.  30  and  Fig.  31  that  properly  reducing 
the  amount  of  iterations  can  dramatically  save  processing  time  while  maintaining  relatively 
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high  SIRI.  In  addition,  we  have  indications  of  near-real-time  performance  when  MRFD  is 
implemented  in  C  (as  previously  described  in  Section  3.7)  and  run  on  a  PC.  Therefore 
real-time  performance  on  a  dedicated  DSP  should  not  be  a  problem. 


Figure  31:  Processing  time  versus  length  of  input  mixtures. 


4.5  Results  with  One  Speaker  in  the  Presence  of  Background 
Noise 

If  a  background  noise  (white  noise  or  even  music)  is  present  in  a  single  speaker’s  speech 
signal,  MRFD  can  be  used  to  separate  the  single  speech  signal  from  the  background  noise 
with  SIRJs  comparable  to  the  two  speaker  case.  This  results  from  the  basic  assumption  that 
the  sources  are  uncorrelated.  The  system  model  is  illustrated  in  Fig.  32,  where  S2(n)  is  now 
a  White  Gaussian  Noise  (WGN)  signal. 


Figure  32:  Block  diagram  of  noise-cancellation  problem. 
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Simulations  results  indicate  that  we  achieve  16.4dB  measured  by  (3.23)  and  (3.24)  when 
the  speech  signal  power  is  approximately  the  same  as  the  noise  noise  power,  i.e.  input  SNR 
=  OdB.  In  the  case  of  knowing  that  there  is  no  cross-channel  contribution  from  si(n)  to 
X2(n),  i.e.  pure  noise,  we  can  apply  a  constraint  of  W21  =  0,  which  increase  the  output  SNR 
by  5.03dB  to  21.46dB.  Fig.  33  is  a  plot  of  output  SNR  versus  the  SNR  of  noise-corrupted 
input  speech  signals  x(n).  R'om  this  performance  plot,  we  can  see  when  the  speech  signal 
is  buried  in  the  background  noise  (— 20dB  <  SNRi  <  OdB),  we  can  achieve  23  —  30dB  SNR 
improvement  (SNRI),  which  shows  successful  noise-cancellation.  When  the  the  source  speech 
signal  dominates  the  mixture  (SNRi  >  40dB),  no  SNRI  can  be  seen  since  there  is  little  noise 
to  be  cancelled  in  the  first  place. 


Figure  33:  Signal-to-Interference  Ratio  Improvement  as  a  function  of  noise  after  two-channel 
BSS  processing. 
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5  Conclusions 


In  the  blind  source/speech  separation  problem,  we  attempt  to  sepai’ate  out  individual,  speech 
signals  from  a  background  of  other  undersired  speech  signals  and  other  acoustic  noise  sources. 
In  this  research,  we  have  gained  an  understanding  of  the  source  separation  problem  for 
convolutional  mixtures  and  developed  expertise  with  the  Multiresolution  Frequency-Domain 
algorithm.  We  have  improved  the  MRFD  algorithm  by  developing  methods  for  exploiting 
symmetry  in  order  to  reduce  computation  by  50%  and  developing  initialization  strategies 
for  better  performance.  We  have  experimentally  determined  optiirial  parameters  for  the 
algorithm  and  studied  the  effects  of  room  size  and  liveliness  on  separation  performance. 
Simulations  with  both  digitally-filtered  and  mixed  speech  signals  as  well  as  real-world  signals 
indicate  good  performance  with  this  algorithm  with  10-20dB  improvements  to  the  Signal-to- 
Interference  Ratio  of  these  sources.  In  the  important  case  of  speech  enhancement,  i.e.  blind 
suppression  of  background  noise  from  a  desired  speech  signals,  we  have  been  able  to  increase 
SIR!  upwards  of  20dB.  Finally,  our  work  has  demonstrated  a  robustness  to  separating  speech 
signals  in  the  presence  of  noise.  Furthermore,  we  have  carefully  examined  the  potential 
for  real-time  processing  with  this  algorithm  and  have  concluded  that  good  performance  is 
possible  but  with  at  least  a  2sec.  delay  due  to  the  inherent  “block  processing”  nature  of  the 
algorithm. 

In  addition,  as  part  of  the  deliverables  for  this  grant,  we  have  developed  a  blind  speech 
signal  corpus  or  database  containing  speech  signal  mixtures  in  a  variety  of  acoustical  envi¬ 
ronments.  This  database  is  now  available  on  the  Internet  at 

http : //www . ece . nmsu , edu/"pdeleon/BSS/index . html 
for  free  download. 
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6  Recommendations 


We  have  demonstrated  that  speech  signals  can  be  significantly  enhanced  despite  little  knowl¬ 
edge  regarding  the  speech  signal  itself,  background  noise  signals,  or  acoustical  environment. 
The  primary  constraints  are  the  need  for  multi-channel  input  signals  and  a  2sec.  delay  in 
producing  the  enhanced  signal.  Given  these  constraints  it  may  be  possible  to  apply  this  work 
to  improving  automatic  speech  recognition  in  adverse  environments,  although  it  is  unknown 
at  this  time,  the  direct  impact  SIRI  has  on  ASR  accuracy. 

In  most  environments,  there  will  be  more  than  two  sources  of  speech  and  noise.  In  this 
case,  we  require  extension  of  the  two-channel  MRFD  to  the  multi-channel  case  involves  gen¬ 
eralization  of  (3.14).  While  this  is  fairly  straightforward,  two  problems  must  be  investigated 
in  order  for  multi-channel  BSS  algorithms  to  be  realized  to  their  full  potential.  First,  the 
multi-channel  BSS  algorithms  will  be  very  computationally  complex  with  complexity  grow¬ 
ing  as  the  square  of  the  number  of  channels.  With  more  than  two  acoustic  sources,  it  is  not 
expected  that  the  algorithms  could  be  computed  in  real-time  with  current  and  near-term 
DSP  technology.  This  problem  might  be  addressed  with  a  specialized  analog  co-processor. 
Second,  the  permutation  inconsistency,  where  separated  sources  may  contain  spectral  energy 
at  certain  frequencies  from  the  other  source  and  vice  versa,  will  be  exacerbated  in  the  multi¬ 
channel  case.  In  this  case,  there  are  now  many,  many  more  permutations  which  result  from 
simple  row  swaps  of  the  unmixing  filter  matrix.  This  problem  may  be  resolved  through  the 
multi-resolution  approach  (or  some  version  of  it).  The  idea  is  that  with  short  unmixing  filter 
lengths,  Q,  the  solution  is  smooth  in  the  frequency  domain  and  for  the  most  part  the  per¬ 
mutations  are  self-aligning.  In  order  to  increase  spectral  resolution  and  subsequently  SIRI, 
Q  is  increased.  Thus  for  the  multi-channel  BSS  algorithm,  we  may  require  more  stages  to 
bring  the  filter  length,  <5,  up  slowly  in  order  to  maintain  permutation  consistency. 
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Appendix  A  -  Derivation  of  the  Cost  Function  Gradient 

We  know  that  for  any  real-valued  function  f{z)  of  a  complex  variable  z,  the  gradient  Vzf{z) 
can  be  obtained  by 

V  f(z)  =  Mil  +  Mil" 

_  Sfe  (Ai) 


We  can  expand  (3.9)  as 

R  I,.,  M  -  \  1  r  4..(w,fc)  R.,Mk)  1  f  ivjiH 

'  1  w^,(u)  Wn{<^)  J  [  R.Ju,,k)  J  [  w;,{u,)  w;,(w) 

_  [  ^  Ai2(^)^)  1 


[Ry^,iuj,k)  X  J 
which  is  a  sequence  of  Hermitian  matrices  with 

HyA^,k)  =  [Wii(tj)An((^,fc)  +  W^i2(a^)4.,(^,A:)]VK2*i(a;) 

+  -H  Wi2(a;)A..(a;,fc)]  ^2*2(0;) 

-  [W2i(a;)4„(u;,A:)  +  W22(a;)4,,(a;,A:)] 

-h  [W2i(a;)4nKfc)  +  W22(a;)^x32(‘^,fc)]  VKiVu;). 


Therefore, 


_  \  n  dJUjj) 


=  2E 


^  \S\Ryn{'^,k)?  ,  a|A,..(‘-,fc)l" 


fell  «W"(a.) 


■aw*(a;) 


AT 

fc=l 

=  2f;V(u.,t)D 

k^l 

K 

=  2'£Y{u,k)W{uj)M^,k). 

k=zl 


dRy^^{uj,k) 

aw*(w) 


A  = 

0  0 
Wn{ijj)RxiA^,k)  +  Wi2{u)R:,^^{u,k)  Wxi{(jj)R:,^^{(jO,k)  +  Wi2(u;)^,2j(w,fc) 
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B  = 


+  W22{(^)Rx2i{(^,k)  W2l{uj)R:,^^{uJ,k)  +  W22{uj)Rx22i^>^) 
0  0 


and 


D  = 

+  Wi2{u})Rx2ii^,k)  Wniu!)^,2{^,k)  +  W^i2(w)^x22(w, fc) 
W2i{ij)Rxii{oJ,k)  +  VK22(a;)fii2i(a;,fc)  iy2iM-Rii2(‘^,  fc)  +  H^22(w)^x22(‘^,  fc) 
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Appendix  D  -  MATLAB  Simulation  Codes 

mrfd208.m 


I - 

y.  CO-CHANNEL  CONVOLUTIVE  MIXED  SPEECH  SEPARATOR 

•/. 

7.  This  MATLAB  code  implements  the  blind  speech  separation  algorithm  of  Ikram 
y,  and  Morgan  [1] .  The  problem  is  as  follows.  Given  two  signals,  xl  and 
y,  x2  each  containing  a  convolutive  mixture  of  two  source  speech  signals,  si  and 
y,  s2  produce  separated  speech  signals,  yl  and  y2  such  that  yl  "  si,  y2  ~  s2. 
y  This  is  a  blind  separation  problem  since  we  only  have  xl  and  x2  anH  no 
y,  other  additional  information.  The  approach  assumes  the  \incorrelatedness 
'/,  between  the  original  source  signals.  We  then  attempt  to  diagonalize  the 
y,  estimated  Power  Spectral  Density  (PSD)  matrices  at  multiple  time  segments 
y,  to  decorrelate  the  mixtures  in  frequency-domain  (avoiding  the  deconvolution 
y  in  time-domain)  to  achieve  the  sepeiration. 

•/. - - - 

y.  Notes 

%  1.  This  MATLAB  implements  the  modified  version  of  the  algorithm  in  [1]  as 
y,  given  in  Table.  3  in  this  thesis. 

y. 

y,  2.  All  signals  stored  as  column  vectors. 

•/. - 

y.  Version  History 

y. 

y,  2.05  (05  Sep.  01)  Cleaned  up  the  mess  and  added  comments. 

7. 

y,  2. QA  (26  Aug.  01)  Selected  the  optimal  parameters,  and  changed  K  to  6. 

7. 

7,  2.03  (25  Aug.  01)  Fixed  a  bug  on  the  implementation  of  the  symmetric 
y,  constraint,  the  correct  one  is  W( omega)  =W(Q-omega+l) 

5*  for  omega=l:Q,  so  we  need  to  seek  W(omega)  for 

y»  omega=l;T/2+l.  Added  the  objective  measurement  to  evaluate 

how  well  the  diagonalization  worked  at  each  iteration. 

7. 

y,  2.02  (14  Aug.  01)  Modified  the  MRFD  by  the  reinitialization  of  W(omega)  at 
y«  each  stage . 

7. 

y,  2.01  (27  Jul.  01)  Changed  the  required  input  signal  length  of  51  seconds  as 
*/.  the  experiments  in  [1]  to  any  length  N.  Fixed  K=4,  thus 

y  calculate  U=N/(beta*T)  and  M=U/K. 

y. 

7,  2.00  (27  Jun.  01)  Major  rewrite  with  Prof.  De  Leon. 
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y.  (1)  Changed  the  truncation  of  the  ini-mixing  filters  from  frequency-domain 
y,  to  time-domain  to  smooth  the  frequency  responses  and  solve  the 

y,  permutation  problem. 

y,  (2)  Added  the  S3unmetric  constraint  on  the  un-mixing  filters  in  frequency- 
y,  domain  so  that  we  only  needed  to  seek  W(omega)  for  omega=l:T/2,  and 

y,  reduce  the  computational  complexity  by  half. 

y,  (3)  Replaced  most  of  the  loop  commands  by  matrix-based  calculation  to 
y,  reduce  the  computational  time. 

y,  (4)  Tried  to  normalize  the  Gradient  by  its  own  power  instead  of  the 
y,  average  power  of  mixtures,  but  the  SIRI  performance  was  not  as  good 

y,  as  the  original  normalization  approach. 

y. 

y,  1.03  (1  Apr.  01)  Modified  the  approach  to  synthesize  the  output  signals  from 
y,  overlap-save  to  overlap-adding,  by  which  the  separated  signals  sound 

y,  normal  though  no  separation  yet. 

y. 

y,  1.02  (10  Mcir.  01)  Fixed  a  bug  on  the  Hamming  window  function.  Measured  the 
y,  separation  of  instantaneous  mixtures. 

y. 

y,  1.01  (5  Feb.  01)  Added  the  intermediate  separation  process  before  the  second 
y,  stage,  and  also  update  the  PSD  matrices  at  multiple  time  segments. 

y. 

y,  1.00  (Jan  30,  01)  First  version  based  on  the  original  understanding  or  MRFD 
y,  reported  by  Ikram  and  Morgan  in  [1]  .  In  this  version: 
y,  (1).  We  took  the  gradient  term  as  given  in  [2]  without  derivation, 

y,  (2).  We  used  all  the  parameters  given  in  [1],  and  Q  -  [500,  2048]’.  In 

y,  the  first  stage,  Ql=500,  we  only  seek  the  un-mixing  matrices  for 

y,  the  first  Q1  frequencies,  while  for  the  rest  (Q2-Q1)  frequency  bins, 

y,  we  set  W(omega)  =  [l,  0;  0,  1]. 

y,  (3).  For  the  synthesis  of  the  sepaurated  signals,  we  first  passed  the 
y,  mixture  vectors  through  the  un-mixing  filter  matrices  W(omega)  in 

y,  frequency-domain  (multiplication) ,  then  converted  the  output  signals 

y,  back  to  the  time-domain,  finally  we  divided  them  by  the  window 

y,  function  and  took  use  of  the  overlap-save  approach. 

y,  [1]  M.  Ikram  and  D.  Morgan,  "A  multiresolution  Approach  to  Blind  Separation 
y,  of  Speech  Signals  in  A  Reverberant  Environment",  In  Proc.  IEEE  ICASSP, 
y,  Pages  2757-2760,  2001. 

y«  [2]  L.  Parra  and  C.  Spence,  "Convolutive  Blind  Sepairation  of  Non-St  at  ionary 
y,  Sources",  IEEE  Transactions  on  Speech  and  Audio  Processing,  8(3) :320-327, 
y.  May  2000. 

y. 

I - - - - - 

y,  Bug  Reports 

’/c  Report  all  bugs  to  Prof.  Phillip  De  Leon  (pdeleon@nmsu.edu) 
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clear;  close  all;  clc 

I - 

*/,  Set  initial  parameters 

T  =  2048;  */,  Basic  block  length 

beta  =  0.5;  V,  Overlapping  factor 

K  =  6;  ■/,  The  total  number  of  superblocks 

alfa  =  1.0;  V,  Nozmalized  step  size 

L  =  100;  */,  Number  of  iterations 

q  =  [500;  2048] ;  %  Multistage  parameters 


y. - 

7,  Real-world  mixtures 

mixl  =  ’c:\nchen\research\mrfd\altinl.wav’;  mix2  = 

’  c :  \nchen\resecirch\mrf  d\altin2 .  wav ' ; 

7,  The  directory  used  to  save  the  separated  signals 
out  =  ’c:\nchen\research\mrfd\out\final_mrfd\altin\’; 
7i  Read  mixtures  into  vectors 

[xtl,Fs]=wavread(mixl) ;  [xt2,Fs]=wavread(mix2) ; 


7. - - - - - 

7  Determine  the  parameters;  N,  U,  and  H 

N  =  min(length(xtl) ,  length(xt2)) ;  7.  Signal  length  in  samples 
N  =  floor(N/T)*T; 

xtl  =  xtl(l:N);  xt2  =  xt2(l:N);  %  Truncate  signal  to  int.  number  of  blocks 

U  =  floor(N/(beta*T)-l) ;  7.  Number  of  blocks 

M  =  floor(U/K);  %  Number  of  blocks  in  each  superblock 

% - - - - 

7  Initialize  Hamming  window 

t  =  [0:T-1]';  w  =  0.54  -  0.46*cos(2*pi*t/(T-l)) ; 

X— - 

7  Short-Time  Fourier  Transform 
xwl  =  zeros(T,l);  xw2  =  zeros(T,l); 

xl  =  zeros (T,N/ (beta*T) )  ;  x2  =zeros(T,N/(beta*T)) ; 

for  m  =  1:U 

block  =  beta+T+Cm-l) ; 

xwl  =  w.*xtl(block+l:block+T) ; 

xw2  =  w.*xt2(block+l:block+T)  ;7.  The  column  vector  of  windowed  mixtures 
xl(:,m)  =  fft(xwl);  7  xl,  x2  aire  (T  *  MK)  matrices  with  column  vector... 
x2(;,m)  =  fft(xw2);  7  as  the  FFT  of  each  block 
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end; 


•/. - 

*/,  Initializing  the  nn-mixing  filter  matrices 
W  =  zeros(2,2,T);  W(l,l,:)  =  1;  W(l,2,:)  =  0;  W(2,l.:)  =  0; 
W(2,2,:)  =1; 

%  Four  un-mixing  filter  vectors 

Wll  =  zerosCT.l);  W12  =  Wll;  W21  =  Wll;  W22  =  Wll;  Y1  = 
zeros (T,l);  Y2  =  zeros(T,l); 


7.7,  MULTISTAGE  ITERATION  7.7. 

for  q  =  1: length (Q) 

7.  Compute  the  covariance 

Rx=zeros(2,2,T/2+l,K) ;  7.  only  calculate  Rx  on  omega=l:T/2+l  due  to  symmetry 
for  k  =  1:K 

block  =  M*(k-1); 

Rx(l,l,:,k)  =  sum(xl(l:T/2+l,block+[l:M]) .*  ... 

conj (xl (1 : T/2+1 , block+ [1 :  M] ) ) , 2) . /M ; 

Rx(l,2,:,k)  =  sum(xl(l:T/2+l,block+[l:M]) .*  ... 

conj  (x2 (1 ; T/2+1 ,block+[l:M] )) ,2) ./M; 

Rx(2,l,:,k)  =  conj (Rx(l,2, : ,k)) ;  %Hermitian  property  of  the  PSD  matrices 
Rx(2,2,:,k)  =  sum(x2(l:T/2+l,block+[l:M])  .*  ... 

con j  (x2  ( 1 :  T/2+1 ,  block+  [  1 :  M]  ) ) ,  2)  .  /M  ; 

end; 


7.  Average  power  of  the  mixtures  at  each  frequency 
Mu  =  zeros (T/2+1, 1) ; 
for  omega  =  1: T/2+1 
Fronorm  =  0; 
fork=l:K 

Fronorm  =  Fronorm  +  norm(Rx( : , :  ,omega,k) , 'fro’)  .  *2;  7.Frobenius-norm 
end; 

Mu(omega)  *  alfa/Fronorm;  7.  Step  size 
end; 


7.  Reinitializing  the  un-mixing  filter  matrices  at  each  stage 
Ws  -  zeros (2, 2, T) ; 

Ws(l,l,:)  =  1;  Ws(l,2,:)  =  0;  Ws(2,l,:)  =  0;  Ws(2.2,:)  =  1; 


for  1  -  1:L  7.  Iteration  for  L  times 

for  omega  =  1:  T/2+1  7.  Seek  WComegaj  at  every  frequency  bin 
Gradt  =  zeros (2, 2) ; 7.  Initializing  Gradient 
for  k  =  1:K 
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V  =  Ws(: ,omega)*Rx(: .omega, k)*Ws(: .omega)  ’ ; 

V  =  V  -  diag(diag(V)) ;  7, cost  function 

Gradt  =  Gradt  +  V*Ws( : , : , omega) *Rx( : , :  .omega, k) ; 
end; 

Gradt  =  2  *  (Gradt-diag(diag(Gradt))) ;  */.  Gradient 
Ws(: .omega)  =  Ws omega)  -  Mu(omega)*Gradt ;  */,  Update 
end; 

7,*/,  Apply  the  S3^etric  constraint 
Ws(:.:,T/2+2:T)  =  conj (Ws(: , : ,T/2:-l :2)) ; 

7.7.  Truncation  of  W(omega)  in  time-domain 
Wll=reshape(Ws(l.l, :),T,1);  W12=reshape(Ws(l,2, :) .T.l)  ; 

W21=reshape ( Ws (2 , 1 , : ) , T . 1 ) ;  W22=re shape (Ws (2 , 2 , : )  ,  T ,  1 ) ; 

Wll=ifft(Wll);  W12=ifft(W12);  W21=ifft(W21) ;  W22=ifft  (W22) ;  7.  to  TD 
Wll=fft(Wll(l:Q(q)),T);  W12=fft(W12(l  :Q(q))  ,T) ;  7.  to  frequency  domain 

W21=f f t (W21 ( 1 : Q (q) ) , T) ;  W22=f f t (W22 ( 1 : Q (q) ) ,  T)  ; 

Ws(l,l.:)=Wll;  Ws(l,2,:)=W12;  Ws(2. 1, : )=W21;  Ws(2,2. ; )=W22; 
end;  7.  End  of  L  iterations 

%  Current  Stage  Separation 
for  m  =  1:U 

Y1  =  Wll.*xl(:.m)  +  W12.*x2(;  ,m) ; 

Y2  =  W21.*xl(; ,m)  +  W22.*x2(: ,m) ; 
xl(:,m)  =  Yl; 
x2(:.m)  =  Y2; 

end; 

%  Update  the  overall  un-mixing  filter  matrices 
for  omega  =  1;T 

W(:,:, omega)  =  Ws(: .omega)  ♦  W(: .omega) ; 
end; 


END  of  MULTISTAGE  ITERATION  7,7, 

- 

- y 


end; 

7.7. 

7. — 
7. — 


y, - 

7.  Inverse  Fourier  Transform  and  synthesize  separated  signals  by  Overlap-adding 
block  =  beta*T;  yyl  =  zeros(T,l);  yy2  =  yyl;  yl  =  3ryl;  y2  =  yyl; 
for  m  =  1:U 

yyl  =  real(ifft(xl(: ,m))) ; 
yy2  =  real(ifft(x2( : ,m))) ; 

LI  =  length(yl); 

L2  =  length (y2) ; 

yl  =  [yl(l:Ll-block);yl(Ll-block+l:Ll)+yyl(l:block)  ;yyl (block+1  :T)]  ; 
y2  =  [y2(l:Ll-block);y2(Ll-block+l: LI) +yy2(l: block)  ;yy2 (block+1  :T)]; 
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end;  e  =  10“ (-7) ; 

yl  =  yl./(niax(abs(yl))+e) ;  y2  =  y2./(max(abs(y2))+e) ;  7,  Normalize 


y. - - 

%  Save  separated  signals 

wavwrite(yl,Fs,strcat(out ,  ’altin_T’  ,ntim2str(T)  ,  ’_l.wav’))  ; 
wavwrite(y2,Fs,strcat(out ,  ’altin.T'  ,num2str(T)  ,  '_2.wav’))  ; 


mrfdSIM203.  m 

7. - - - — - - - - - 

7,  CO-CHANNEL  CONVQLUTIVE  MIXING  k  UN-MIXING  (MRFD)  SIMULATOR 
*/• 

7t  This  MATLAB  code  simulates  the  convolutive  mixing  process  using  the  simulated 
7.  impulse  responses  generated  by  the  "image  method"  [1]  and  implements  the 
7.  blind  speech  separation  algorithm  of  MRFD  by  Ikram  and  Morgan  [2].  The 
7.  problem  is  as  follows.  Given  two  source  signals,  si  and  s2,  and  four  mixing 
7.  filters  simulated  by  the  "image  method",  we  generate  the  mixtures  xl  and  x2 
7t  by  x_i  =  s_i*h_ij  +  s_j*h_ij,  where  is  the  linear  convolution  operator. 

7t  Then  we  implement  MRFD  based  on  the  statistics  collected  from  the  xl  and  x2 
7.  and  produce  separated  speech  signals,  yl  and  y2  such  that  yl  “  si,  y2  “  s2. 

7.  Finally,  we  determine  the  Signal -to-Noise-Ratio  Improvement  (SIRI) . 

7. 

7.  [1]  J.  B.  Allen  and  D.  A.  Berkley,  "Image  Method  for  efficiently  simulating 
7.  small-room  acoustics" ,  J,  Acoust.  Soc.  Am.,  65(4)  :943 — 950,  1979. 

7. 

7o  [2]  M.  Ikram  and  D.  Morgan,  "A  multiresolution  Approach  to  Blind  Separation 
7.  of  Speech  Signals  in  A  Reverberant  Environment",  In  Proc.  IEEE  ICASSP, 

7.  Pages  2757-2760,  2001. 

•/. - - - - - - 

7»  Notes 

%  1.  This  MATLAB  implements  the  modified  version  of  the  algorithm  in  [1]  as 
7t  given  in  Table.  3  in  this  thesis,  and  measures  the  average  SIRI 

%  performance  for  the  selection  of  optimal  parameters  and  analysis  of 
7.  MRFD  under  various  reverberant  environment. 

7, 

7«  2.  All  signals  stored  as  colmnn  vectors. 

7. 

y. - - - - - - - 

7.  Version  History 

7. 

7o  2.03  (Oct.  10,  01)  Cleaned  up  mess  and  added  comments. 

7. 

7o  2.02  (Sep.  10,  01)  Added  the  second  measurement  of  SIRI,  which  could  be  used 
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•/. 

•/. 

7.  2.01 
7. 

7. 

7. 

7.  2.00 
7. 

7. 

7. 


when  the  mixing  filters  were  unknown. 

(Sep.  01,  01)  Rewrote  the  main  code  to  be  able  to  measure  the  average 
SIRI  of  num_sour  pairs  of  mixtures  with  the  source 
speeches  randomly  selected  from  TIMIT. 

(Aug.  06,  01)  Changed  the  mixing  filters  by  the  simulated  impulse 

responses  of  real  room  generated  through  the  image  method 
in  [3] 


7.  1.02  (Jul.  22, 

7. 

7. 

7. 

7. 


01)  Changed  the  mixing  filters  by  the  impulse  responses  given 
at:  Prof.  De  Leon’s  BSS  website: 

http :  //WWW .  ece .  nmsu .  edu/"pdeleon/BSS/index .  html . 
Truncated  those  impulse  responses  to  the  first  T  samples. 


7.  1.01  (Feb.  28,  01)  Added  the  calculation  of  Signal  to  Interference  Ratios 
7t  Improvement (SIRI ) ,  the  impulse  responses  were  offered  by  Prof.  De  Leon, 

7.  with  direct-channel  response  hii  from  the  speaker  i  to  the  telephone 
%  transmitter  i,  and  cross-channel  response  hij  (i~=j)  from  speaker  j 

7  to  telephone  i  over  the  phone  line,  we  truncated  the  impulse  responses 

7  to  get  the  first  T  samples. 


7 


7  1.00  (Jan.  30,  01)  First  version  based  on  the  code  mrfd.real.m  version  1.0. 
y. - 


7  Required  routines:  At  least  one  of 

7  siri_rho.m  (determine  SIRI  with  varying  reflection  coefficients), 

7  siri.rms.m  (determine  SIRI  with  varying  room  size). 


7 

7 

7 

7 

7 

7- 


iBiag_filt.m  (simulate  the  room  impulse  response  by  image  method), 
sroom.m  (simulate  the  room  impulse  response  by  image  method), 
Ithimage.m  (simulate  the  room  impulse  response  by  image  method), 
siri_measl.m  (measure  SIRI  with  known  sources  and  mixing  filters), 
siri_meas2.m  (measure  SIRI  without  known  sources  and  mixing  filters). 


clear;  close  all;  clc; 

y. - - - 

7  Initializing  parameters 

num_sour  =  20;  7  Number  of  random  pairs  for  average  SIRI  measurement 
rho  -  0.1:0. 1:1.0;  7  Analysis  of  MRFD  for  Veirious  reflection  coefficients 
7  rho  =  0.3  7  Fixing  rho  when  analyzing  other  pcirameters 
T  =  2048; 

7a  =  [4:2:30]’;  7  Analysis  of  varying  room  size  [a,  a,  a/2] 


y. - 

7  Set  work  directory 
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database  =  'c:\nchen\TIMIT';  %  Location  of  TIMIT  database 
f unc_path  =  ‘  c :  \nchen\reseeirch\mrf  d\ref  lect  ’ ; 

out_fig  =  ‘C:\nchen\research\mrfd\reflect\ana_reflect.fig’; 

path  =  path (path, f Tonc.path) ;  %  Add  function  path  into  Matlab  search  path 


y,  - - - - - - - - - - 

y,  Get  TIMIT  database  directory  structure 
eveil(strcat(‘cd(“  ‘ ,  database, 
inpath  =  dir;  *h  Get  the  "dir"  structure 
n=size  (inpath) ; 

inpath  =  inpath(3:n(l)) ;  ‘/.cut  the  first  two  elements,  which  are  and 

y  - - - - - - - - 

y.  Determine  average  SIRI 
SIRI  =  zeros (lenjgth(rho) ,  1) ; 

for  i  =  l:num_sour  ‘f.  Determine  the  SIRI  for  each  pair  of  mixture 
samps  =  ceil (600. ♦rand( 1,2)) ; 

path  =  inpath(samps) ;  */,  Randomly  choose  2  speech  signals 
for  j  =  1:  length (rho)  '/, 

SIR  =  siri_rho(path(l) .name,  path(2) .name,  rho(j)); 
y,  RL  [a(j);  a(j);  a(j)/2];  '/.  Room  dimension 

y,  SIR  =  siri_rms(path(l)  .name,  path(2)  .name,  RL) ;  V,  SIRI  performance  with 
SIRI(j)  =  SIRI(j)  +  SIR;  */,  varying  room  size 

end; 
end; 

SIRI  =  SIRI./num_sour;  '/,  Average  SIRI 


y,  - - - - - 

y,  Plot  SIRI  versus  varying  reflection  coefficients 

figure;  plot (rho, SIRI) ;  xlabeK ’Reflection  Coefficient  \rho’); 

ylabeK’SIRI  (dB)’);  grid;  saveas(gcf  ,out_fig) ; 


sirLrho.m 

function  SIRI  ==  siri_rho(sourcel,  source2,  rho); 

y. - - - - - 

y,  SIRI  PERFORMANCE  ANALYSER  (For  varying  reflection  coefficients) 

y, 

y,  This  MATLAB  code  implements  the  MRFD  under  vaurious  reverberant  environment 
y,  by  varying  reflection  coefficients  in  the  "image  method"  simulation. 

y. 

y,  Call  Syntax:  SIRI  -  siri_rbo(sourcel ,  source2,  rho) 

% 
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y,  INPUT  arguments: 

7,  Name:  sourcel/2 
y,  Type :  string 

7,  Description:  File  name  of  source  speech  signal  (.wav) 

y. 

7,  Name:  rho 
y.  Type:  scalar 

y.  Description:  Reflection  Coefficient 

y. 

X  OUTPUT  arguments : 

7.  Name:  SIRI 
7,  Type:  scalar 

%  Description:  Signal-to-Interference  Ratio  Improvement 

7. 

y,  Creation  Date:  Aug.  06,  2001 
y,  Last  Revision:  Sept.  10,  2001 
- 


I - - - 

%  Set  initial  parameters 

T  -  2048;  */,  Basic  block  length 

beta  =  0.5;  7,  Overlapping  factor 

K  =  6;  y.  The  total  number  of  superblocks 

alfa  =  1.0;  %  Normalized  step  size 

L  =  100;  7,  Number  of  iterations 

Q  =  [500;  2048];  7,  Multistage  parameters 


•/. - 

7,  Simultate  Mixing  filters 

[hll.hl2,h21,h22]  =  imag_filt(rho) ;  Hll  =  fft(hll);  H12  = 
fft(hl2);  H21  =  fft(h21);  H22  =  fft(h22); 


7. - - 

7,  Source  Signals 

[stl.fl]  =  wavread(sourcel);  [st2,f2]  =  wavread(source2) ; 

*/.. - - - 

7,  Determine  the  parameters:  N,  U,  and  M 

N  =  aindengthCxtl) ,  length(xt2)) ;  7,  Signal  length  in  samples 
N  =  floor (N/T)*T; 

stl  =  stl(l:N);  st2  =  st2(l:N);  7,  Truncate  signal  to  int.  number  of  blocks 

U  =  floor  (N/(beta*T) -1)  ;  7,  Nvunber  of  blocks 

M  =  floor  (U/K);  7,  Number  of  blocks  in  each  superblock 
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y.  Initialize  Hamming  window 

t  =  [0:T-1]*;  w  =  0.54  -  0.46*cos(2*pi*t/(T-l)) ; 


I - - - - - - - - - - - 

7,  Short-Time  Fourier  Transform  and  Mixing  Process  by  Circuleir  Convolution 
swl=  zeros (T,l)  ;  sw2=  zeros (T,l) ; 

si  =  zeros(T,U);  s2  =  zeros(T,U); 

xl  =  zeros(T,U);  x2  =  zeros(T,U);  pack; 


for  m  -  1:U 

block  =  beta*T*(m-l) ; 

swl  =  w.*stl(block+l:block+T) ; 

sw2  =  w.*st2(block+l:block+T) The  column  vector  of  windowed  source  signals 
sl(:,m)  =  fft(swl);  7,  si,  s2  are  (T  *  MK)  metrics  with  column  vector... 

s2(:,m)  =  fft(sw2);  7  as  the  FFT  of  each  block 

xl(:,m)  =  H11.!)‘s1(:  ,m)  +  H12.*s2(:  ,m) ;  7o  Mixing  process  by  circular 

x2(:,m)  =  H21.*sl(:  ,m)  +  H22.*s2(:  ,m) ;  7.  convolution 

end; 

7.  7. - - - — - - - - 

7.  7  Mixing  Process  by  Linear  Convolution  and  Short-Time  Fourier  Transform 
7  xtl  =  conv(stl,hll)  +  conv(st2,hl2) ; 

7  xt2  =  conv(stl,h21)  +  conv(st2,h22) ;  7  Mixing  process  by  linear  convolution 
7  xtl  -  xtl(l:N);  xt2  =  xt2(l:N); 

7 

7  7  Short-Time  Fourier  Transform 
7  swl  =  zeros(T,l);  sw2  =  zeros(T,l); 

7  xwl  *  zeros (T,l);  xw2  «  zeros (T,l) ; 

7  si  =  zeros (T, num_blk) ;  s2  =  zeros (T,num_blk)  ; 

7  xl  =  zeros (T,num_b Ik) ;  x2  =  zeros (T,num_blk) ; 

7  pack; 

7  for  m  =  l:num_blk 
7  block  =  beta*T*(m-l); 

7 

7  swl  =  w.*stl(block+l  :block+T) ;  7  The  column  vector  of  windowed  source 

7  sw2  =  w.*st2(block+l:block+T) ;  7  signals 

7  sl(:,m)  =  fft(swl);  7  si,  s2  are  (T  *  MK)  matrics  with  column  vector... 

7  s2(:,m)  =  fft(sw2);  7  as  the  FFT  of  each  block 

7 

7  xwl  =  w.*xtl(block+l:block+T) ;  7  The  column  vector  of  windowed  mixed 

7  xw2  -  w.*xt2(block+l  :block+T) ;  7  signals 

7  xl(:,m)  =  f  ft  (xwl);  7  xl,  x2  are  (T  *  MK)  matrics  with  column  vector... 

7  x2(:,m)  =  fft(xw2);  7  as  the  FFT  of  each  block 
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y.  end; 


- - 

y.  Initializing  the  un-mixing  filter  matrices 
W  =  zeros(2.2.T);  W(l,l.:)  =  1;  W(1.2.:)  =  0;  W(2.1,:)  =  0; 
W(2,2.:)  =  1; 

y.  Four  un-mixing  filter  vectors 

Wll  =  zeros (T.l):  W12  =  Wll;  W21  =  Wll;  W22  =  Wll; 

Y1  =  zerosCT.l):  Y2  =  zerosCT,!); 


y,y,  MULTISTAGE  ITERATION  ’/.*/. 

for  q  =  l:length(Q) 

y.  Compute  the  covariance 

Rx=zeros(2,2,T/2+l,K) ;  *!%  only  calculate  Rx  on  omega=l:T/2+l  due  to  symmetry 
for  k  =  1:K 

block  =  M*(k-1); 

Rx(l,l,:,k)  =  sum(xl(l:T/2+l,block+[l:M]) ... 

conj (xl ( 1 : T/2+1 .block+ [1 : M] ) ) . 2) . /M; 

Rx(l,2,:,k)  =  sum(xl(l:T/2+l,block+[l:M]) .*  ... 

conj  (x2  ( 1 :  T/2+1 , block+  [1 :  M] )  ) ,  2)  .  /M ; 

Rx(2,l,:,k)  =  conj(Rx(l,2,:,k));  ’/.Hermitian  property  of  the  PSD  matrices 
Rx(2,2,:,k)  =  sum(x2(l:T/2+l,block+[l:M]) .*  ... 

conj (x2 C 1 : T/2+1 ,block+ [1 : M] )), 2) . /M ; 

end; 

y.  Average  power  of  the  mixtures  at  each  frequency 
Mu  =  zeros(T/2+l,l) ; 
for  omega  =  1: T/2+1 
Fronorm  =0; 
for  k=l:K 

Fronorm  =  Fronorm  +  norm(Rx(: , :  .omega.k) , ’fro’)  . *’2;  yFrobenius-norm 
end; 

Mu  (omega)  =  alfa/Fronorm;  */,  Step  size 
end; 

y.  Reinitializing  the  un-mixing  filter  matrices  at  each  stage 
Ws  =  zeros (2, 2, T) ; 

WsCl.l,:)  =  1;  Ws(1.2,:)  =  0;  Ws(2,l.:)  =  0;  WsC2,2.:)  =  1; 

for  1  =  1:L  y.  Iteration  for  L  times 

for  omega  =  1: T/2+1  *L  Seek  W(omega)  at  every  frequency  bin 
Gradt  =  zeros(2,2)  ;*/,  Initializing  Gradient 


56 


for  k  =  1:K 

V  =  Ws(: , : ,omega)*Rx(: , : ,omega,k)*Ws(: , : ,omega) ' ; 

V  =  V  -  diag(diag(V)) ;  */,cost  function 

Gradt  =  Gradt  +  V*Ws( : , : , omega) *Rx( : , : , omega, k) ; 
end ; 

Gradt  =  2  *  (Gradt-diag(diag(Gradt))) ;  7.  Gradient 
Ws  omega)  =  Ws  omega)  -  Mu(omega)*Gradt;  7,  Update 
end; 

7.7.  Apply  the  symmetric  constraint 
Ws(:,:,T/2+2:T)  =  conj (Ws( : , : ,T/2:-l :2)) ; 

7.7.  Truncation  of  W(omega)  in  time-domain 
Wll=reshape(Ws(l, 1, :) ,T,1) ;  W12=reshape(Ws(l,2, :)  ,T,1) ; 

W21=reshape (Ws (2.,  1 , :)  ,T,1)  ;  W22=reshape(Ws(2,2, :)  ,T,1)  ; 

Wll=ifft(Wll);  W12=ifft(W12);  W21=ifft(W21)  ;  W22=if f t  (W22)  ;  7.  to  TD 
Wll=fft(Wll(l:Q(q))  ,T) ;  W12=fft(W12(l:Q(q)),T)  ;  7.  to  frequency  domain 

W21=fft(W21(l:Q(q)),T);  W22=fft(W22(l:Q(q)) ,T) ; 

Ws(l,l,:)=Wll;  Ws(1.2,:)=W12;  Ws(2,l, :)=W21;  Ws(2,2, :)=W22; 
end;  7.  End  of  L  iterations 


7.  Current  Stage  Separation 
for  m  =  1:U 

Y1  =  Wll.*xl(:,m)  +  W12.*x2(: ,m) ; 

Y2  =  W21.*xl(:,m)  +  W22.*x2(: ,m) ; 

xl(:,m)=Yl; 

x2(: ,m)  =  Y2; 

end; 

7.  Update  the  overall  un-mixing  filter  matrices 
for  omega  =  1:T 

W(:,:, omega)  =  Ws(: , : ,omega)  *  W(: .omega) ; 
end; 

end; 

7.7.  END  of  MULTISTAGE  ITERATION 

I - 

•/. - - - - - 

7.  The  mixing  filter 

H  =  zeros(2,2.T);  H(l.l,:)  =  Hll;  H(1.2,:)  =  H12;  H(2,l.:)  =  H21; 
H(2,2.:)  =  H22; 


7.7. 

-7. 

-7. 


7.  7. - - - 

%  */t  Determine  SIRI  by  the  first  measurement  of  SIRI  with  known  sources  and 

7.  7.  mixing  filters 

7.  SIRI  =  siri_measl(T,U,sl,s2,H,W) ; 
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I - - - 

7,  Determine  SIRI  by  the  second  measurement  of  SIRI  without  known  soiirces  and 
7.  mixing  filters 

I - - - - - 

7,  The  contribution  when  one  source  is  "on",  the  other  is  "off" 

xtll  =  convCstl.hll);  xt21  =  conv(stl,h21) ;  7.  Contribution  of  source  1 

xtl2  =  conv(st2,hl2) ;  xt22  =  conv(st2,h22) ;  7.  Contribution  of  source  2 

xtll  =  xtlKlrN);  xt21  =  xt21(l:N):  7#  Contribution  of  source  1 

xtl2  =  xtl2(l:N):  xt22  =  xt22(l:N);  7.  Contribution  of  source  2 


- - 

7.  Short-Time  Fourier  Transfonn 

xwll  =  zerosCT.l);  xwl2  =  zeros(T,l);  7. 

xw21  =  zeros(T,l);  xw22  =  zeros(T,l);  7t 

xll  =  zeros (T,num_blk) ;  xl2  =  zeros (T,num_blk) ;  7. 

x21  =  zeros (T,num_blk) ;  x22  =  zeros (T,num_blk) ;  % 

pack;  7. 

for  m  =  l:num_blk 

block  =  beta*T*(m-l) ; 

xwll  =  w  .*  xtll(block+l:block+T) 
xwl2  =  w  .♦  xtl2(block+l:block+T)  ;7. 
xw21  =  w  .*  xt21(block+l:block+T)  ;7. 

xw22  =  w  .*  xt22(block+l:block+T);7.  The  column  vectors  of  windowed  mixtures 

xll(:,m)  =  fft(xwll);  7, 

xl2(:,m)  =  fft(xwl2);  7. 

x21(:,m)  =  fft(xw21);  7.  xll,  xl2,  x21,  x22  are  (T  ♦  MK)  matrics  with  column 

x22(:,m)  =  fft(xw22);  7.  vector  as  the  FFT  of  each  block 

end; 


- - 

7,  Convert  x_ij  into  one  matrix 

x(:,:,l)  =  xll;  x(:,:,2)  =  xl2;  x(:,:,3)  =  x21;  x(:,:,4)  =  x22; 
SIRI  =  siri_meas2(T,U,x,W) ; 


imag-filt.m 

function  [hll,hl2,h21,h22]  =  imag_filt(rho) 

y. - - - 

7,  ROOM  IMPULSE  RESPONSE  SIMULATOR 

7. 

y.  This  MATLAB  code  simulates  the  acoustic  impulse  responses  in  a  small  room. 
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7. 

7.  Call  Syntax:  Chll,hl2,h21,h22]  =  iniag_filt(rho) 

7. 

7.  INPUT  arguments: 

7i  Name:  rho 
7t  Type :  scalar 

7.  Description:  Reflection  coefficient 

7. 

7.  OUTPUT  arguments: 

7.  Name:  h_ij 
7t  Type:  vector 

Description:  Impulse-  response  from  source  j  to  microphone  i 

7. 

7.  AUTHORS:  Ning  Chen,  Dr.  De  Leon 
%  Version:  1.0 

7f  Creation  Date:  Jul.  31,  2001 
7.  Last  Revision: 

7.  References: 

./. - 


y. - - - - - 

7.  Set  initial  parameters 

NPTS  =  2048;  T  =  1/16000;  C  =  344; 

unit  =  0.3048;  %  feet  ->  meter  convert  unit 


I - 

7.  Vector  radius  to  receiver  in  smaple  preiods 
Rrl  =  [2.23  ;  2.74;  1 . 68]  . / (T*C)  ;  7. 

Rr2  =  [2.83  ;  2.74;  1 . 68] . / (T*C) ; 

7.  Vector  radius  to  source  in  smaple  preiods 
Rsl  =  [2.13  ;  0.91;  1.52]  ./(T*C) ;  7. 

Rs2  =  [3.35  ;  0.91;  1.52] ./(T*C) ; 

7,  Vector  of  box  dimensions  in  smaple  periods 
RL  =  [5.06;  3.41;  2.44] ./(T*C) ; 

y - - 

7.  Vector  of  six  wall  reflection  coefs 
BETA  =  [rho ,  rho ,  rho ;  rho ,  rho ,  rho] ; 


I - - - 

7.  7t  Case  2:  when  rho  is  fixed  to  0.3  and  room  size  is  varying 
7.  X  =  RL(1) ; 

7,  7.  Vector  radius  to  receiver  in  smaple  preiods 
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7.  Rrl  =  [x/2-1;  x/2+1;  1.7]./(T*C); 

7.  Rr2  =  [x/2+1;  x/2+1;  1.7]./(T*C); 

7. 

*/,  7.  Vector  radius  to  source  in  smaple  preiods 
7.  Rsl  =  [x/2-1;  x/2-1;  1.5]./(T*C); 

7,  Rs2  =  [x/2+1;  x/2-1;  1.5]./(T*C); 

7. 

7.  7.  Vector  of  box  dimensions  in  smaple  periods 

7.  RL  =  RL./(T*C);  7.  RL  as  a  parameter  with  unit  of  ’m' 

7. 

7.  7.  Vector  of  six  wall  reflection  coefs 
7.  BETA  =  [.3,. 3,. 3;  .3,. 3,. 3]; 


- 

hll  =  sroomCRsl, Rrl, RL. BETA. NPTS);  7. 
hl2  =  sroom(Rs2,Rrl,RL, BETA, NPTS);  7. 
h21  =  sroomCRsl, Rr2,RL, BETA, NPTS) ;  % 
h22  =  sroom (Rs2,Rr2,RL, BETA, NPTS) ; 


7.- - 

7.  Normalization 

hll  =  hll./max(abs(hll)) ;  hl2  =  hl2./niax(abs(hl2)) ;  7. 
h21  =  h21./max(abs(h21)) ;  h22  =  h22./inaix(abs(h22)) ; 


y. - 

7.Plot  of  mixing-filter  impulse  responses 
7.t  =  1:NPTS; 

7.subplot(2,2,l)  ;  plot (t, hll,  ’r’)  ; 
7tSubplot(2,2,2)  ;  plot(t,hl2,  'g’)  ; 
7subplot(2,2,3) ;  plotCt,h21, ’b') ; 
)isubplot(2,2,4) ;  plot(t,h22,  ’k’) ; 


sroom.m 

function  HT  *  sroom(Rr,Rs,RL, BETA, NPTS) 

y+j(c*j(c*»*l|t:|cl(c***!(c***)|c*****+  +  J(c***Jtc#*j(c***j(ti|c***l|e+****l(t+********!(c********Jtt+***>»<+ 

7.  Subroutine  to  calculate  a  room  impulse  response, 

7. 

%  Input  Arguments: 

X  Name :  Rr 

7.  Type:  3*1  real  vector 

7.  Description:  Vector  radius  to  receiver  in  smaple  preiods  =  length/ (C*T) 

7. 

y.  Name :  Rs 
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y.  Type:  3*1  real  vector 

y,  Description:  Vector  radius  to  source  in  smaple  preiods  =  length/(C*T) 

y. 

y.  Name :  RL 

y.  Type:  3*1  real  vector 

y.  Description:  Vector  of  box  dimensions  in  smaple  preiods  =  length/ (C*T) 
'/• 

y.  Name :  BETA 

y,  Type:  6*1  real  vector 

y.  Description:  Vector  of  six  wall  reflection  coefs  (0  <  BETA  <  1) 

y. 

y.  Name :  NPTS 

y.  Type:  integer  number 

y.  Description:  Number  of  points  of  HT  to  be  computed 

y. 

y.  Output  Arguments: 
y.  Name:  HT 

y.  Type:  vector 

y.  Description:  Impulse  Response 

y. 

y,  AUTHORS:  Ning  Chen,  Dr.  De  Leon 
y.  Version:  1.0 

%  Creation  Date:  Jul.  31,  2001 
y.  Last  Revision: 


%  References: 

%♦♦♦*♦*♦♦♦♦**♦♦♦♦♦♦*♦♦♦*♦*♦**♦♦**♦********************************♦* 


y.  Set  initial  parameters 

HT  =  zerosCNPTS, 1) ;  NR  =  2eros(3,l): 

DELP  =  2eros(8,l)  ;y,Vector  of  eight  source  to  image  distances  in  sample  periods 
PERM  =[-1-1-1-11111;... 

-1  -1  1  1-1-1  1  1;... 

-11-11-11-11]; 


PERM2  =  sign(PERM+l) ; 


dis  =  norm(Rr-Rs) ;  “/, 


if  dis  <  .5 
HT(1)  =  1; 
return; 

end; 

N1  =  floor(NPTS/(RL(l)*2))+l;  */.  Range  of  sum 
N2  =  floor(NPTS/(RL(2)*2))+l;  */. 
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N3  =  floor(NPTS/(RL(3)*2))+l; 

for  NX  =  -N1:N1 

for  NY  =  -N2:N2 
for  NZ  =  -N3:N3 

NR  =  [NX;NY;NZ]; 

DELP  =  LTHIMAGE(Rr,Rs,RL,NR,PERM): 
for  i  =  1:8 

ID  »  round(DELP(i)); 

FDMl  =  ID; 

ID  =  ID  +  1; 
if  ID  >  NPTS 
break; 

end; 

•/. 

GID  =  BETA(l,:)'.-abs(NR-PERM2(:.i)).*BETA(2,:)'.“abs(NR); 
GID  =  GID(1)*GID(2)*GID(3)/FDM1; 

HT(ID)  =  HT(ID)  +  GID; 

end; 

end; 

end; 

end; 


7,  Filter  with  Hi  Pass  filter  of  17,  of  sampling  freq  (i.e.  lOOHz) 


W  =  2.*4.*atan(l.)*100. ;  7. 
T  =  .0001;  7. 
Rl=  exp(-W*T);  R2=  Rl;  7. 
Bl=  2.*Rl*cos(W*T);  B2=  -Rl“2;  7. 
Al=  -(1.+R2);  A2=  R2;  7. 
Yl=  0;  Y2=  0;  Y0=  0; 


for  i  =  1:NPTS  7.  Filter  HT 
XO  =  HT(i) ; 

HT(i)  =  YO  +  A1*Y1  +  A2*Y2; 
Y2  =  Yl; 

Y1  =  YO; 

YO  =  B1*Y1  +  B2*Y2  +  XO; 

end; 


Ithimage.m 

function  DELP  *  LTHIMAGE(DRr,DRs.RL.NR,PERM) 

^  Jjc  *  sjc  j|e  J#c  ♦  3(6  a|e  + J|C  aK  ♦  J|C  *  :|t  :4c  ♦  :|£  3#C  Jts  J<S  Jfc  ajc  s*e  3(e  sft  3<e  3<e  ♦♦♦♦♦♦♦  Jje 

y.  Subroutine  to  calculate  a  room  impulse  response . 
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7. 

7,  Input  Arguments; 

7  Name:  DRr 

7.  Type:  3*1  real  vector 

7  Description:  Vector  radius  to  receiver  in  smaple  preiods  =  length/ (C*T) 

7 

7  Name:  DRs 

7  Type:  3*1  real  vector 

7  Description:  Vector  radius  to  source  in  smaple  preiods  =  length/ (C*T) 

7 

7  Name :  RL 

7  Type:  3*1  real  vector 

7  Description:  Vector,  of  box  dimensions  in  smaple  preiods  =  length/ (C*T) 

7 

7  Name:  NR 

7  Type:  3*1  real  vector 

7  Description:  Vector  of  mean  image  number 

7. 

7  Output  Arguments: 

7  Name :  DELP 

7  Type:  8*1  real  vector 

7  Description:  Vector  of  8  source  to  images  distances  in  smaple  periods 

7 

7  AUTHORS:  Ning  Chen,  Dr.  De  Leon 
7  Version:  1.0 

7  Creation  Date:  Jul.  31,  2001 
7  Last  Revision: 

7  References: 


for  i  =  1:8 

RP(:,i)  =  DRr  +  PERM(: ,i) .*DRs; 

end; 

R2L  =  2*RL.*NR;  for  i  =  1:8 
R1  =  R2L  -  RP(:,i); 

DELP(i)  =  norm(Rl); 

end; 


sirijmeasl.m 

function  SIRI  =  siri_measl(T,  U,  si,  s2,  H,  W); 

y, - - - - - — - 

7  SIRI  MEASUREMENT  1  (With  known  sources  and  mixing  filters) 
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7. 

7  This  MATLAB  code  determine  the  SIRI  with  known  sources  and  mixing  filters. 

7. 

7,  Call  Syntax:  SIRI  =  siri_measl(T,U,sl,s2,H,W) 

7. 

7t  INPUT  arguments: 

7,  Name:  T 

7.  Type :  scalar 

*/»  Description:  Block  length 

7. 

7#  Name:  U 
7.  Type:  scalar 

7.  Description:  Number  of  blocks 

7. 

7.  INPUT  arguments: 

%  Name:  sl/2 
7.  Type:  T*U  matrix 

7,  Description:  Short-Time  Fourier  representative  of  source  signals 

7. 

7«  Name:  H 

7.  Type:  2*2*T  matrix 
7,  Description:  Mixing  filter  matrix 

7. 

%  Name:  W 

7.  Type:  2*2*T  matrix 

7,  Description:  Un-mixing  filter  matrix 

7. 

7.  OUTPUT  arguments: 

7.  Name:  SIRI 
7,  Type:  scalar 

7,  Description:  Signal-to-Interference  Ratio  Improvement 

7. 

7.  Creation  Date:  Aug.  06,  2001 
7,  Last  Revision:  Sept.  10,  2001 

- - - - 

- - 


7  The  system  overall  filter 
A=zeros(2,2,T);  for  omega  =  1:T 

A(:,:, omega)  =  W(:,:, omega)  ♦  H( : , : ,omega) ; 
end; 


I - 

7.  Power  of  the  filters 
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All=abs (reshape (A (1,1, :) ,T,1)) .“2; 
A12=abs (reshape (A (1,2, :) ,T,1)) . “2; 
A21=abs (reshape (A (2,1, :) ,T,1)) . ~2; 
A22=abs(reshape(A(2,2, :)  ,T,1))  ."2;  7, 
HHll=abs(Hll) . *2;  HH12=abs(H12) . ^2\% 
HH21=abs(H21) . “2;  HH22=abs(H22) . “2; 


I - 

7.  SNRI 

num  =  0;  numl  =  0;  deni  -  0;  den  =  0; 
for  m  =  1:U 

sfl  =  abs(sl(: ,m)) .“2;  sf2  =  abs(s2(: ,m)) .*2; 
nuinl=  numl  +  s\im(HHll.*sfi)  +  sum(HH22.*sf2) ; 
denl=  deni  +  sum(HH12. *sf2)  +  sum(HH21.*sfl) ; 
nxim  =  num  +  sum(All.*sfl)  +  sum(A22.*sf2) ; 
den  =  den  +  sum(A21.*sfl)  +  sum(A12.*sf2) ; 
end; 

SIRi  =  10*logl0(numl/denl) ;  7.  Input  SIR 
SIRo  =  10*logl0(num/den) ;  7«  Output  SIR 

SIRI  =  SIRo  -  SIRi;  7.  SIR  Improvement 


sirijmeas2.m 


function  SIRI  =  siri_meas2(T,  U,  x,  W); 

./, - - - - - 

7.  SIRI  MEASUREMENT  2  (Without  known  sources  and  mixing  filters) 

7. 

7.  This  MATLAB  code  determine  the  SIRI  without  known  sources  and  mixing  filters 

7. 

%  Call  Sjrntax:  SIRI  =  siri_meas2(T,U,W) 

7. 

7.  INPUT  arguments : 

%  Name:  T 
7.  Type:  scalar 
yi  Description:  Block  length 
*/• 

%  Name:  U 
Type:  scalar 

7.  Description:  Number  of  blocks 

7. 

%  Name:  x 

7,  Type:  T*U*4  matrix 
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*/,  Description:  Short-Time  Fourier  representative  of  mixed  signals 

7. 

%  Name:  W 

7.  Type:  2*2*T  matrix 

7,  Description:  Un-mixing  filter  matrix 

7. 

7.  OUTPUT  arguments: 

7t  Name:  SIRI 
7,  Type:  scalar 

7i  Description:  Signal-to-Interference  Ratio  Improvement 

7. 

7  Creation  Date:  Aug.  06,  2001 
7.  Last  Revision:  Sept.  10,  2001 

y, - 


y, - - - - 

7.  Un-mixing  filters 

Wll  =  reshape(W(l,l,:),T,l);  W12  =  reshape(W(l  ,2, : )  ,T,  1)  ;  7. 
W21  =  reshape(W(2,l,:),T,l);  W22  =  reshape(W(2,2, :) ,T, 1) ; 


- - - 

7.  SNRI 

numl  «  0;  deni  =  0;  num=0;  den=0;  7. 
for  m  =  1:U 

numl=  numl  +  sum(abs(x(:  ,m,l))  .“2)  +  sum(abs(x(:  ,m,4))  .“2)  ;  7«7t  Calculate.. 
denl=  deni  +  sum(abs(x( :  ,m,2))  .  “2)  +  sum(abs(x( :  ,m,3))  .*2)  ;  7.7.  SIRi 
num  =  num  +  sum(abs(Wll.*x(:  ,m,l)+W12.*x(:  ,m,3))  .  2)  +  ... 

sum(absCW21.*x(: ,m,2)+W22.*x(: ,m,4)) .“2) ; 
den  =  den  +  sum(abs(Wll.*x(: ,m,2)+W12.*x(: ,m,4)) .*2)  +  ... 

sumCabs (W21 . *x( : ,m, 1)+W22 . ♦xC :  ,m,3) ) . “2) ; 

end; 


SIRi  =  10*logl0(numl/denl);  7.  Signal  to  Interference  Ratio  of 
SIRo  =  10*logl0(num/den) ;  7.  Signal  to  Interference  Ratio  of 
SIRI  =  SIRo  -  SIRi;  7.  SIR  Improvement 


the  input 
the  output 
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Appendix  C  -  C  Simulation  Codes 

speech,  c 

/* - 

*  CO-CHANNEL  CONVOLUTIVE  MIXED  SPEECH  SEPARATOR 

*  This  C  code  implements  the  blind  speech  separation  algorithm  of  Ikram 

*  and  Morgan  [1] .  The  problem  is  as  follows.  Given  two  signals,  xl  auid 

*  x2  each  containing  a  convolutive  mixture  of  two  source  speech  signads,  si  and 

*  s2  produce  separated  speech  signals,  yl  and  y2  such  that  yl  "  si,  y2  “  s2. 

*  This  is  a  blind  sepairation  problem  since  we  only  have  xl  and  x2  and  no 

*■  other  additional  information.  The  approach  assumes  the  uncorrelatedness 

*  between  the  original  source  signals.  We  then  attempt  to  diagonalize  the 

*  estimated  Power  Spectral  Density  (PSD)  matrices  at  multiple  time  segments  to 

*  decorrelate  the  mixtures  in  frequency-domain  ‘(avoiding  the  deconvolution  in 

*  time-domain)  to  achieve  the  separation. 

*  - 

♦  SYNTAX 

« - ■ - 

♦  From  command  line:  ./speech  <filel>  <file2> 

♦ 

♦  Where  <fileX>  are  the  files  containing  the  the  two  mixtures. 

♦  - - - 

*  NOTES 

:|c - ■ - 

*  1)  This  version  of  the  code  is  NOT  optimized  for  performance,  it  is  written 

*  now  for  readability  and  mathematical  correctness.  The  reader  may  notice 

*  lines  of  code  that  do  something  that  will  not  change  the  numerical  value 

*  of  whats  being  calculated  but  it  is  what  would  happen  if  you  carried  out 

*  the  matrix  calculation.  These  lines  of  code  can  easily  be  removed  for 

*  performance  at  a  later  time  but  are  left  in  to  account  for  correctness. 

* 

*  2)  Everything  that  deals  with  the  vectors  WXX.final  can  be  removed  for 

♦  performance  increase,  the  values  are  calulated  purely  for  end  user 

♦  information, 

* 

*  3)  The  code  currently  accepts  input  of  ASCII  files  only,  this  is  for  ease  of 

*  use.  The  input  and  output  values  can  easily  and  quickly  be  evaluated. 

*  For  performance  later,  switching  to  binary  inputs  may  be  desired.  A 

*  program  such  as  Mat lab  can  be  used  to  take  and  put  back  the  ASCII  files 

*  into  .wav  formats. 

* 

*  4)  There  are  portions  of  the  code  that  will  need  to  be  modified  for  a 

*  real-time  implementation  as  we  will  only  have  blocks  of  samples  to  work  on 


67 


♦  and  not  the  entire  signal.  Control  mechanisms  will  will  need  to  be  added 

♦  as  well  to  control  what  block  is  being  operated  on  and  what  block  is 

♦  being  filled  with  new  samples. 

♦ 

*  5)  Output  will  be  stored  in  output_sourcel.txt  and  output_source2.txt.  These 

*  can  be  modified  as  desired. 

* 

♦  6)  The  only  variable  that  needs  to  be  changed  from  mixture  to  mixture  is 

♦  SIGNAL_LENGTH  unless  other  parameters  want  to  be  changed,  i.e  the  number  of 

*  stages  or  the  iterations. 

* 

♦  7)  The  code  is  fairly  str?Light  forward  if  you  have  a  copy  of  the  Matlab 

♦  version  as  well,  so  it  is  recommended  that  the  reader  review  the  Matlab 

♦  verson  of  the  code  as  it  is  easier  to  understand. 

♦ 

♦  8)  This  code  was  developed  on  a  Linux  box  running  the  2.4.x  kernel  using  libs 

♦  from  glibc-2.2.4.  It  was  compiled  using  gcc-2.96.  It  adheres  to  ANSI 

♦  standard  so  it  should  be  easily  portable. 

- - 

♦  REFERENCES 

♦  [1]  M.  Ikram  and  D.  Morgan,  "A  multiresolution  Approach  to  Blind  Separation  of 

♦  Speech  Signals  in  A  Reverberant  Environment",  In  Proc.  IEEE  ICASSP,  Pages 

♦  2757-2760,  2001. 

♦  [2]  L.  Parra  and  C.  Spence,  "Convolutive  Blind  Separation  of  Non-St  at  ionary 

♦  Sources",  TF.F.F  Trauisactions  on  Speech  and  Audio  Processing,  8(3)  :320-327, 

♦  May  2000 . 

♦ 

♦ - 

*  VERSIONS 

*  - 

♦  1.2.1  (12/18/01) 

♦ 

♦  1,2.0  (12/17/01) 

* 

♦ 

★ 

* 

♦ 

♦  1.1.0  (12/01/01) 

♦ 

* 

♦ 

* 

♦ 


Last  minute  comment  changes. 

Code  now  matches  latest  algorithm  implementation  and  is 
working. 

Add  normalization  of  output  vectors . 

Cleanup  variables  and  memory  allocations ; 

Lots  of  commenting. 

start  merge  to  final  version  of  algorithm  using  the  final 
Matlab  code  as  a  blueprint. 

Put  the  2  stages  of  the  multistage  iteration  into  a  for  loop. 
(Jhange  how  Frobenius  norm  is  calculated. 

Apply  Hermitian  property  to  Rx  calculations. 
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♦  1.0.1  (11/28/01)  Cleanup  code  a  little. 

♦ 

♦  1.0.0  (11/18/01)  Working  code.  Still  missing  normalization  code,  but  is  not 

♦  that  important . 

♦  Output  samples  match  Matlab  output  to  the  15/16  digit  after 

♦  normalization  in  Matlab. 

♦ 

♦  0.9.8  (11/18/01)  Make  a  dissmal  attempt  at  normalization  code. 

♦  Finish  stage  2  of  multistage  iteration. 

♦ 

*  0.9.7  (11/11/01)  Add  code  to  rebuild  the  output  vectors  from  blocks. 

♦ 

*  0.9.6  (10/22/01)  Finish  code  to  calculate  cost  function. 

*  Finish  code  to  calculate  gradient. 

*  Add  code  to  calculate  separation  matrix. 

*  Add  code  to  apply  the  symmetric  constraint  on  the  seperation 

*  matrix . 

*  Add  truncation  of  seperation  matrix  in  time-domain. 

*  Finish  stage  1  of  multistaige  iteration. 

* 

*  0.9.5  (09/28/01)  Add  code  to  calculate  Frobenius  norm. 

*  Start  addition  of  code  to  calculate  cost  function. 

*  Start  addition  of  code  to  calculate  gradient. 

* 

*  0.9.4  (07/14/01)  Add  covariance  calculation  code. 

♦ 

*  0.9.3  (07/03/01)  Improved  implementation  of  Short  Time  Fourier  block  of  code. 

♦ 

*  0.9.2  (06/19/01)  Add  Short  Time  Fourier  Transform  block  of  code. 

♦ 

♦  0.9.1  (06/18/01)  Link  in  complex  math  code(fft,  if ft)  and  their  dependencies. 

♦  Construct  hamming  windov  calculation. 

♦ 

♦  0.9.0  (06/17/01)  Initial  code,  setup  up  I/O,  and  command  line  parameters. 

♦  - : - 

♦  CONTACT 

♦  - ^ - 

*  Bug  Reports 

*  Report  all  bugs  to  Prof.  Phillip  De  Leon  (pdeleon0nmsu.edu) 

*  - - - 

*/ 

#include  .<stdio.h> 

#include  <stdlib.h> 

#include  <math.h> 

#include  "cmplx.h" 


69 


void  fftO ; 
void  ifftO ; 

int  mainCint  argc,  char  *argv[]) 

< 

FILE  *input_filel,  *input_file2; 
FILE  *output_filel,  *output_f ile2; 


/♦  Set  initial  pairameters  ♦/ 

const  double  PI  =  3.1415926535897932;  /*  Value  of  pi  *! 
const  double  BETA  =0.5;.  /*  Overlapping  factor  */ 
const  double  ALPHA  =1;  /*  Normalized  step  size  */ 
const  double  E  =  0.0000001;  /*  Offset  used  in  normalization  */ 

const  int  SIGNAL.LENGTH  =  409600;  /*  Signal  length  in  samples  */ 
const  int  BLOCK.LENGTH  =  2048;  /♦  Basic  block  length  */ 
const  int  ITERATIONS  =  100;  /*  Number  of  iterations  ♦/ 
const  int  NUMBER.SUPERBLOCKS  =6;  /*  The  total  number  of  superblocks  */ 
const  int  NUM.FILTER.STAGES  =2;  /*  Multistage  parameters  */ 


/♦  Half  the  value  of  block  length,  avoids  doing  multiple  divides  ♦/ 
const  int  HALF.BLOCK.LENGTH  =  (BLOCK.LENGTH  /  2) ; 

/♦  Number  of  blocks  */ 

const  int  NUMBER.BLOCKS  =  floor  (SIGNAL.LENGTH  /  (BETA  *  BLOCK.LENGTH)  -  1); 
/♦  Number  of  blocks  in  each  superblock  */ 
const  int  NUMBER.BLOCKS. IN.SUPERBLOCK  = 
floor (NUMBER.BLOCKS  /  NUMBER.SUPERBLOCKS) ; 

/*  Length  of  output  signal  */ 

const  int  OUTPUT.SIGNAL.LENGTH  =  (SIGNAL.LENGTH  +  (BLOCK.LENGTH  - 

(BETA  ♦  BLOCK.LENGTH))); 

/*  Calculate  once  to  avoid  many  divides  later  */ 
const  double  INV.NUMBER.BLOCKS.IN.SUPERBLOCK  = 

(1  /  (double )NUMBER.BL0CKS.IN.SUPERBL0CK); 

int  i  =  0; 
int  j  =  0; 
int  k  =  0; 
int  w  =  0; 
int  q  =  0; 
int  xx.index  =  0; 
int  xw.index  =  0; 
int  conjugate.index  = 

*  calculation 
int  block  =  0; 
int  y_ counter  =  0; 


/*  Bogus  variable  for  loops  */ 

/♦  Bogus  variable  for  loops  ♦/ 

/*  Bogus  variable  for  loops  */ 

/*  Little  omega  */ 

/*  Current  iteration  of  stage  */ 

/*  Initialize  index  for  xx  vectors  ♦/ 

/*  Initialize  index  for  xw  vectors  */ 

0;  /*  Initialize  index  for  symmetric  constraint 

*/ 

/*  Initialize  block  number  */ 

/♦  Initialize  coxmter  for  y  vectors  */ 
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int  yy.counter  =0;  /*  Initialize  counter  for  yy  vectors  */ 

int  LI  =  2048;  /*  The  length  of  yl  and  y2,  used  in  reassembling  the  blocks  */ 
int  Q[2]  =  {500,  2048};  /*  q  values  for  multistage  iteration  */ 

double  max_sample_value_l  =  0;  /*  Maximum  value  of  sample,  used  in 

*  normalization  */ 

double  max_sample_value_2  =  0;  /*  Maximum  value  of  sample,  used  in 

*  normalization  */ 

/*  Initialization  for  "double"  vectors  (All  pointers)  ♦/ 
double  *omega; 
double  *xtl; 
double  ♦xt2 ; 

/*  Initialization  fpr  "complex"  values  */ 
complex  fronorm  =  cmplx(0.0,0.0) ; 

complex  gradtll;  complex  gradtl2;  complex  gradt21;  complex  gradt22; 
complex  VI 1;  complex  V12;  complex  V21;  complex  V22; 

/♦  Initialization  for  "complex"  vectors  (All  pointers)*/ 
complex  *mu; 

complex  *xwl;  complex  *xw2; 

complex  *xxll;  complex  ♦xxl2;  complex  *xx21;  complex  *xx22; 

/*  Pointer  to  a  pointer  for  RxXX*/ 

complex  **Rxll;  complex  ♦*Rxl2;  complex  **Rx21;  complex  **Rx22; 
complex  *Ws_ll;  complex  *Ws_12;  complex  *Ws_21;  complex  *Ws_22; 
complex  *W11;  complex  ♦W12;  complex  *W21;  complex  *W22; 

complex  *Wll_temp;  complex  *W12_temp;  complex  *W21_temp;  complex  *W22_temp; 

complex  *Wll_final;  complex  *W12_final; 

complex  *W21_final;  complex  *W22_final; 

complex  *Y1;  complex  *Y2; 

complex  *yyl;  complex  *yy2; 

complex  *yl;  complex  *y2; 

complex  *xl [BL0CK_LENGTH] ;  /*  Static  array  in  one  dimension,  dynamic  in 

♦  other  */ 

complex  *x2 [BLOCK.LENGTH] ;  /*  Static  array  in  one  dimension,  dynamic  in 

*  other  */ 

/*  Check  to  see  if  we  have  the  correct  number  of 
arguments  from  command  line  at  startup  */ 

if  (argc  !=  3)  { 

fprintf (stderr,  "Syntax:  speech  [filel]  [file2]\n"); 
exit (8); 

} 

/*  Open  our  input  files  */ 

if  ((input_filel  =  fopen(argv[l] ,  "r"))  ==  NULL)  { 
fprintf  (stderr,  "speech:  Cannot  open  */,s\n",  argv[l3); 


exit (8) ; 

} 

if  ((input_file2  =  fopen(argv[2] ,  "r"))  ==  NULL)  { 
fprintf  (stderr,  "speech:  Cannot  open  */,s\n",  argv[2]); 
exit (8); 

/*  Reserve  memory  spaces  for  vectors  omega,  xtl,  xt2,  xwl,  xw2,  and  Ws_XX  */ 

if  ((omega  =  calloc((BLOCK_LENGTH) ,  sizeof (double)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8) ; 

} 

if  ((xtl  =  calloc(SIGNAL_LENGTH,  sizeof (double)))  ==  NULL)  i 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8) ; 

} 

if  ((xt2  =  calloc(SIGNAL_LENGTH,  sizeof (double)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8); 

if  ((xwl  =  calloc(BLOCK_LENGTH,  sizeof (complex)))  »=  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8) ; 

if  ((xw2  =  calloc(BLOCK_LENGTH,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8); 

if  ((Ws.ll  *  malloc(BLOCK_LENGTH  *  sizeof (complex)))  ==  NULL)  { 
fprintf  (stderr,  "Out  of  memory\n"); 
exit (8) ; 

if  ((Ws_12  =  malloc(BLOCK_LENGTH  ♦  sizeof (complex)))  ==  NULL)  -C 
fprintf  (stderr,  "Out  of  memoiyXn"); 
exit (8); 

if  ((Ws_21  =  malloc(BLOCK_LENGTH  ♦  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8) ; 

if  ((Ws_22  =  malloc(BLDCK_LENGTH  *  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8); 

> 

/*  The  four  callocs  below  can  be  removed  for  performance, 

*  is  purely  for  user  information  */ 
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if  ((Wll.final  =  calloc(BLOCK_LENGTH,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 

exit (8); 

} 

if  ((W12_final  =  calloc(BLOCK_LENGTH,  sizeof (complex)))  =*  NULL)  { 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8); 

} 

if  ((W21_final  =  calloc(BLOCK_LENGTH,  sizeof (complex)))  *=  NULL)  { 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8); 

} 

if  ((W22_final  =  calloc(BLOCK_LENGTH,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8) ; 

} 

for  (i  =  0;  i  <  BLOCK.LENGTH;  i++)  {  /*This  can  be  removed  for  performance,*/ 
Wll_final[i]  =  cmplx(1.0,0.0) ;  /*  is  purely  for  user  information  */ 

W12_final[i]  =  cmplx(0.0,0.0) ;  /♦  Initialize  the  vectors  that  */ 

W21_final[i]  =  cmplx(0.0,0,0) ;  /♦  store  the  final  separation  matrix  */ 
W22_finalti]  =  cmplx(1.0,0.0) ;  /♦  */ 

} 

/♦  Create  storage  space  for  a  BLOCK.LENGTH  *  NUMBER_BL0CKS 
♦atrix  of  matrices  for  xl  and  x2  */ 

for(i  =  0;  i  <  BLOCK.LENGTH;  i++)  { 
if  ((xl[i]  =  calloc(NUMBER_BLOCKS,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8); 

} 

if  ((x2[i]  =  calloc(NUMBER_BLOCKS,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8); 

} 

} 

/*  Read  mixtures  into  vectors  */ 
i  =  0; 

while  ((fscanf  (input.filel,  "7,lf",  &xtl[i]))  && 

(fscanf (input _file2,  "7»lf",  &xt2[i]))  !=  EOF)  -C 

i++; 

} 

/♦  Close  input  files  */ 
fclose (input _f ilel)  ; 

73 


f close (input_file2) ; 


fprintfCstderr, "Starting  Stages\n"); 

/♦ - */ 

/♦  Hamming  window  calculations  */ 
forCi  =  0;  i  <  BLOCK.LENGTH ;  i++)  i 
omega[i]  =  0.54  ~  0.46  *  cos(2  ♦  PI  *  i  /  (BL0CK_LENGTH  ~  l))j 
} 

- */ 

/*  Short-Time  Fourier  Transform  */ 
for(i  =  0;  i  <  NUMBER.BLOCKS ;  i++)  { 

block  =  (BETA  *  BLOCK.LENGTH..*  i) ;  /*  Calculate  what  block  we  are  on  */ 

xw_ index  =0; 

for(j  =  block;  j  <  (block  +  BLOCK.LENGTH);  j++)  { 

/*  The  column  vector  of  windowed  mixtures  (channel  1)  */ 
xwl [xw.index]  =  cmplx (omega [xw.index]  *  xtl[j],0); 

/*  The  column  vector  of  windowed  mixtures  (Channel  2)  */ 
xw2  [xw.index]  =  cmplx (omega [xw.index]  *  xt2[j],0); 
xw.index++; 

} 

f ft (BLOCK.LENGTH,  xwl);  /*  Fast  Fourier  Transform  */ 
f ft (BLOCK.LENGTH,  xw2) ;  /*  Fast  Fourier  Transform  ♦/ 

/*  Store  FFT  values  in  a  T  x  NUMBER.BLOCKS  matrix  */ 
for(k  =  0;  k  <  BLOCK.LENGTH;  k++)  { 

/*  Store  calculated  xwl  into  our  BL0CK_LENGTH*NUMBER.BL0CKS  matrix  */ 
xl  [k]  [i]  =  xwl  [k] ; 

/*  Store  calculated  xw2  into  our  BL0CK_LENGTH*NUMBER_BL0CKS  matrix  */ 
x2  [k]  [i]  =  xw2  [k] ; 

} 

} 

/*  Cleanup  any  storage  we  don’t  need  anymore  */ 
free(xtl);  free(xt2);  free(omega);  free(xwl);  free(xw2); 


- - - 

/♦  MULTISTAGE  ITERATION  */ 

/  ♦ - — - _____ - ___  _  ___  __  —  —  — ♦/ 

for  (q  =  0;  q  <  NUM.FILTER.STAGES;  q++)  { 
fprintf(stderr," Entering  Stage  y,d\n",  q) ; 

/♦  Allocate  space  for  RxXX  vectors  */ 

if  ((Rxll  =  calloc(NUMBER_SUPERBLOCKS,  sizeof (complex  *)))  *=  NULL)  { 
fprintf  (stderr,  "Out  of  memory \n"); 
exit (8) ; 

} 

if  ((Rxl2  =  calloc(NUMBER.SUPERBLOCKS,  sizeof (complex  *)))  ==  NULL)  { 
fprintf  (stderr,  "Out  of  memoryXn"); 
exit (8) ; 
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if  ((Rx21  =  callocCNUMBER.SUPERBLOCKS,  sizeof (complex  *)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8); 

} 

if  ((Rx22  =  callocCNUMBER.SUPERBLOCKS,  sizeof (complex  *)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8); 

} 

forCi  =  0;  i  <  NUMBER.SUPERBLOCKS ;  i++)  { 

/*  Allocate  space  for  xxXX  vectors  */ 

if  ((xxll  =  callocCHALF.BLOCK.LENGTH  +  1.  sizeof (complex)))  *=  NULL){ 
fprintf (stderr ,  "Out  of  memory \n" ) ; 
exit (8); 

} 

if  ((xxl2  =  calloc(HALF_BLOCK_LENGTH  +  1,  sizeof (complex)))  ==  NULL){ 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8); 

} 

if  ((xx21  =  callocCHALF.BLOCK.LENGTH  +  1,  sizeof (complex)))  ==  NULL)-C 
fprintf (stderr,  "Out  of  memory\n") ; 
exit (8); 

> 

if  ((xx22  =  callocCHALF.BLOCK.LENGTH  +  1,  sizeof (complex)))  =  NULL){ 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8) ; 

} 

for(j  =  0;  j  <  NUMBER.BLOCKS.IN.SUPERBLOCK;  j++)  { 

/*  Calculate  where  we  aire  in  the  super  block  */ 
xx.index  =  (NUMBER.BLOCKS.IN.SUPERBLOCK  *  i)  +  j; 

/♦Computation  of  covariance*/ 
for(k  =  0;  k  <  (HALF.BLOCK.LENGTH  +  1);  k++)  { 
xxll[k]  =  cadd(xxll[k] ,(cmul(xl[k] [xx.index] , 

(conjgCxl [k] [xx.index] ))))); 
xxl2  [k]  =  cadd(xxl2  [k]  ,  (cmuKxl  [k]  [xx.index]  , 

( con j  g (x2 [k] [xx.index] ))))); 
xx21[k]  =  conjg(xxl2[k]) ;  /♦  Hermit ian  property  of 
♦  PSD  matrices  */ 

xx22[k]  =  cadd(xx22[k] , (cmul(x2[k] [xx.index] , 

( conj  g (x2 [k] [xx.index] ))))); 

} 

} 

/*  Multiply  all  the  values  in  xx  vectors 

*  by  1  /  NUMBER.BLOCKS.IN.SUPERBLOCK  */ 
for(k  =  0;  k  <  (HALF.BLOCK.LENGTH  +  1);  k++)  { 
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xxllCk]  =  cmuKxxllCk]  ,cmplx(INV_NUMBER_BLOCKS_IN_SUPERBLOCK,0)); 
xxl2Ck]  =  cmul(xxl2[k] ,cmplx(INV_NUMBER_BLOCKS_IN_SUPERBLOCK,0)); 
xx21[k]  =  cmul(xx21[k],cmplx(INV_NUMBER_BL0CKS_IN_SUPERBL0CK,0)); 
xx22 [k]  =  cmul  (xx22 [k] , cmplx ( INV_NUMBER_BLOCKS_IN_SUPERBLOCK , 0) ) ; 

} 

/♦  Save  the  xx  vectors  calculated  above,  NUMBER_SUPERBLOCKS  of  them 
*  should  be  stored  */ 

Rxll[i]  =  xxll; 

Rxl2[i]  =  xxl2; 

Rx21[i]  =  xx21; 

Rx22[i]  =  xx22; 


/*  Allocate  space  for  seperation  matrix  */ 

if  ((Wll  *  callocCBLOCK.LENGTH,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8) ; 

} 

if  C(W12  =  callocCBLOCK.LENGTH,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memoryXn”); 
exit (8) ; 

} 

if  ((W21  *  callocCBLOCK.LENGTH,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8) ; 

if  ((W22  =  callocCBLOCK.LENGTH,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8) ; 

>  ^  r 

if  ((mu  =  malloc((HALF.BLOCK_LENGTH  +  1)  *  sizeof (complex)))  ==  NULL)  i 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8) ; 

} 

/*  Average  power  of  the  mixtures  at  each  frequency  ♦/ 

for  (j  =  0;  j  <  (HALF_BLOCK_LENGTH  +  1) ;  j  ++)  { 

/♦  Initialize  Frobenius  norm  to  zero  for  each  stage  ♦/ 
fronorm  =  cmplx (0. 0, 0. 0) ; 

/♦  Calculate  the  Frobenius  norm  */ 

/*  Rxll“2  +  Rxl2“2  +  Rx21“2  +  Rx22“2  */ 
for  (k  =  0;  k  <  NUMBER.SUPERBLOCKS ;  k++)  { 

fronorm  =  caddCfronorm, 

cadd(cmul((*(Rxll+k)) [j]  .conjg((*(Rxll+k)) [j])), 

cadd(cmul((*(Rxl2+k)) [j] ,conjg((*(Rxl2+k)) Cj])) , 
cadd(cmul((*(Rx21+k))  [j]  ,conjg((*(Rx21+k))  [j])) , 
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cmul((*(Rx22+k)) [j] ,conjg((*(Rx22+k)) [j])))))) ; 

} 

/*  Divide  the  step  size  ALPHA  by  the  calculated  Frobenius  norm 
*  (normalize)*/ 

niu[j]  =  cdiv(cmplx(ALPHA,0,0)  ,fronorm) ; 

} 

/*  Reinitializing  the  un-mixing  filter  matrices  at  each  stage  */ 

/*  Initialize  un-mixing  filter  matrices  to  identiy  matrix  */ 
forCj  =0;  j  <  BLOCK.LENGTH;  j++)  { 

Ws_ll[j]  =  cmplx(1.0,0.0) ; 

Ws_12[j]  =  cmplx(0.0,0.0) ; 

Ws_21[j]  =  cmplx(0.0,0.0) ; 

Ws_22[j]  =  cmplxd.O.O.d); 


for(j  =  0;  j  <  ITERATIONS;  j++)  {  /*  Iterations  for  ITERATIONS  times  ♦/ 

/*  Seek  W( omega)  at  every  frequency  bin  */ 
for  (w  =  0;  w  <  (HALF.BLOCK.LENGTH  +  1);  w++)  { 
gradtll  =  cmplx(0.0,0.0) ;  /*  */ 

gradtl2  =  cmplx(0.0,0.0) ;  /*  Initialize  Gradient  to  zero  */ 

gradt21  =  cmplx(0.0,0.0) ;  /*  ♦/ 

gradt22  =  cmplx(0.0,0.0) ;  /♦  */ 

for(k  =  0;  k  <  NUMBER.SUPERBLOCKS ;  k++)  { 

Vll  =  cadd(cmul(conjg(Ws_ll [w] ) ,cadd(cmul(Ws_ll[w] , 

(* (Rxll+k) ) [w] ) , cmul (Ws_12 [w] , (* (Rx21+k) ) [w] ) )  )  , 
cmul (conjg(Ws_12 [w] ) , cadd(cmul (Ws_ll [w] , 

(* (Rxl2+k) ) [w] ) , cmul (Ws_12 [w] , (♦ (Rx22+k) ) [w] ) ) ) )  ; 
V12  =  cadd(cmul(conjg(Ws_21[w]),cadd(cmul(Ws_ll[w], 

(*(Rxll+k)) [w]) ,cmul(Ws_12[w] , (*(Rx21+k)) [w]))) , 
cmul(conjg(Ws_22[w] ) ,cadd(cmul(Ws_ll [w] , 

(♦ (Rxl2+k) ) [w] ) , cmul (Ws_12 [w] , (* (Rx22+k) ) [w] ) ) ) )  ; 
V21  =  cadd(cmul(conjg(Ws_ll [w]  ) ,cadd(cmul(Ws_21[w] , 

(*(Rxll+k)) [w]) ,cmul(Ws_22[w] , (*(Rx21+k)) [w])))  , 
cmul(conjg(Ws_12Cw] ) ,cadd(cmul(Ws_21 [w] , 

(* (Rxl2+k) ) [w] ) , cmul (Ws_22 [w] , (* (Rx22+k) ) [w] ) ) ) ) ; 
V22  =  cadd(cmul(conjg(Ws_21[w]),cadd(cmul(Ws_21[w], 

(♦(Rxll+k)) [w]) .cmul(Ws_22[w] , (♦(Rx21+k)) [w]))) , 
cmul(conjg(Ws_22[w] ) ,cadd(cmul(Ws_21 [w] , 

(* (Rxl2+k) ) [w] ) . cmul (Ws_22 [w] , (* (Rx22+k) ) [w] ) ) ) ) ; 
Vll  =  csub(Vll,Vll);  /*  ♦/ 

V12  =  csub(V12,cmplx(0.0,0.0)) ;  /*  Cost  fiinction  */ 

V21  =  csub(V21,cmplx(0,0,0.0)) ;  /*  (can  be  made  much  more  */ 

V22  =  csub(V22, V22) ;  /*  effiecient)  */ 

gradtll  =  cadd(gradtll,cadd(cmul((*(Rxll+k)) [w] , 

cadd(cmul(Vll,Ws_ll [w]) ,cmul(V12,Ws_21 [w] ))) , 
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cmulC(*(Rx21+k)) [w] ,cadd(cmul(Vll,Ws_12[w]) , 
cmul(V12,Ws_22 [w] ))))); 

gradtl2  =  cadd(gradtl2,cadd(cmul((*(Rxl2+k)) [w]  , 

caddCcmuKVll  ,Ws_ll  [w]  )  , cmul (V12 , Ws_21  [w] ) ) ) . 
cmul((*(Rx22+k))  [w]  ,cadd(cniul(Vll,Ws_12[w])  , 
cmul (V12,Ws_22[w] ))))); 

gradt21  =  cadd(gradt21,cadd(cmul((*(Rxll+k)) [w] , 

cadd(cmul(V21 ,Ws_ll [w] ) , cmul(V22,Ws_21 [w] ) )) , 
cmul((*(Rx21+k)) [w] ,cadd(cmul(V21,Ws_12[w] ) , 
cmul (V22 , Ws_22 [w] ) ) )  )  )  ; 

gradt22  =  cadd(gradt22,cadd(cmul((*(Rxl2+k)) [w] , 

..  cadd(cmul(V21,Ws_ll[w]),cmul(V22,Ws_21[w]))), 
cmul((*(Rx22+k)) [w] ,cadd(cmul(V21,Ws_12[w]) . 
cmul(V22,Ws_22[w] ))))); 

} 

/*  Gradient (can  be  made  more  efficient)  ♦/ 
gradtll  =  cmul(cmplx(2.0,0.0) ,csub(gradtll,gradtll)) ; 
gradtl2  =  cmul(cmplx(2.0,0.0) ,csub(gradtl2, cmplxCO. 0,0.0))) ; 
gradt21  =  cmul(cmplx(2.0,0.0) ,csub(gradt21, cmplxCO, 0,0.0))) ; 
gradt 22  =  cmul ( cmplx (2 . 0 , 0 . 0) , csub  Cgradt22 , gradt22) ) ; 

Ws_ll[w]  =  csub(Ws_ll[w] , cmul (mu[w] .gradtll)) ;  /*  */ 

Wsll2[w]  =  csub (Ws_12[w], cmul (mu [w] .gradt 12));  /*  Update  ♦/ 

Ws_21[w]  *  csub (Ws_21[w] .cmul (mu [w] ,gradt21)) ;  /♦  */ 

Ws_22[w]  =  csub (Ws_22[w3, cmul (mu [w] ,gradt22));  /*  */ 

} 

/*  Apply  Symmetric  constraint  */ 
conjugate_index  =  0; 

for(i  =  HALF_BLOCK_LENGTH  +  1;  i  <  BLOCK.LENGTH;  i++)  { 

Ws_ll[i]  =  conjg(Ws_llCHALF_BLOCK_LENGTH  -  conjugate. index  -  1]); 

Wsll2[i]  =  conjg(Ws_12[HALF_BL0CK_LENGTH  -  conjugate.index  -  1]); 

Wsl21[i]  =  conjg(Ws_21[HALF_BL0CK_LENGTH  -  conjugate.index  -  1]); 

Wsl22[i]  =  conjg(Ws.22[HALF.BL0CK.LENGTH  -  conjugate.index  -  13): 
conjugate_index++ ; 

} 

/*  Truncation  of  W(omega)  in  time-domain  */ 
for  (i  =  0;  i  <  BLOCK.LENGTH;  i++)  { 

Wll[i3  =Ws.llCi3; 

W12[i3  =  Ws.l2[i3 ; 

W21 [i3  *  Ws.21 Ci3 ; 

W22[i3  =  Ws.22[i3; 

} 

if ft (BLOCK.LENGTH, Wll);  /*  */ 

if ft (BLOCK.LENGTH, W12) ;  /*  Transfer  to  time  domain  */ 

ifft(BL0CK.LENGTH,W21);  /*  */ 

if ft (BLOCK.LENGTH, W22);  /♦  */ 
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/*  Needed  to  zero  pad  the  q  length  block  with  BLOCK_LENGTH-q  zeros  */ 
if  ((Wll_temp  =  calloc(BLOCK_LENGTH,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8); 

} 

if  ((W12_temp  «  calloc(BLOCK_LENGTH,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8) ; 

} 


if  ((W21_temp  =  calloc(BLOCK_LENGTH,  sizeof (complex) ))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8); 

} 

if  ((W22_temp  =  calloc(BLOCK_LENGTH,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8); 

} 


/*  Copy  the  first  Q[q]  samples  over  to  the  temporary  vectors  ♦/ 
ford  =  0;  i  <  QCq];  i++)  { 

Wll_temp[i]  =  Wll[i]; 

W12_temp[i]  =W12[i3; 

W21_temp[i]  =W21[i]; 

W22_temp[i]  =W22Ci]; 

} 


fft(BLOCK_LENGTH,  Wll.temp);  /*  */ 

fft(BLOCK_LENGTH,  W12_temp) ;  /*  Transfer  back  to  frequency  domain  */ 

fft(BLOCK_LENGTH,  W21_temp) ;  /*  */ 

fft(BLOCK_LENGTH,  W22_temp) ;  /♦  ♦/ 

for  (i  *  0;  i  <  BLOCK.LENGTH;  i++)  { 

Ws_ll[i]  =  Wll_temp[i] ;  /*  */ 

Ws_12[i]  =  W12_tempCi];  /♦  Save  values  to  be  used  again  in  */ 

Ws_21 [i]  =  W21_temp [i] ;  /♦  next  stage .  ♦/ 

Ws_22 [i]  =  W22_temp [i] ;  /*  */ 

} 

} 

/♦  End  of  ITERATIONS  iterations  */ 


/*  Allocate  space  */ 

if  ((Yl  =  calloc(BLOCK_LENGTH,  sizeof (complex) ) )  ==  NULL)  { 
fprintf (stderr,  "Out  of  memoryXn"); 
exit (8) ; 
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} 

if  ((Y2  =  calloc(BLOCK_LENGTH,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8); 

} 


/♦  Current  Stage  Seperation  */ 
for  (i  =  0;  i  <  NUMBER.BLOCKS ;  i++)  { 
forCj  =0;  j  <  BLOCK.LENGTH;  j++)  { 

/*  Multiply  the  current  seperation  matrix  by  input  signals  */ 
Yl[j]  =  cadd(cmul(Ws_ll[j],xlCj]  [i]),cmul(Ws_12[j]  ,x2[j]  [i])); 
/♦  Multiply  the  current  seperation  matrix  by  input  signals  */ 
Y2[j]  =  cadd(cmul(Ws_21[j3  ,xl[j]  [i]),cmul(Ws_22[j]  ,x2[j]  [i])); 
xl[j][i]  =  Yl[j];  /*  Save  seperated  signal  for  next  stage  */ 
x2[j][i]  =  Y2[j];  /*  Save  seperated  signal  for  next  stage  */ 
} 


J 

/*  Update  the  overall  un-mixing  filter  matrices  (This  can  be  removed  for 
*  performance,  it  is  purely  user  information)*/ 
for  (i  =  0;  i  <  BLOCK.LENGTH;  i++)  { 


Wll.finalCi]  =  cadd(cmul(Ws_ll [i] ,Wll_finalCi]) , 

cmul(Ws_12[i] ,W21_f inal[i] )) ; 
cadd(cmul(Ws_ll [i] ,W12_f inal [i] ) , 

cmul(Ws_12[i] ,W22_f inal[i] )) ; 
cadd(cmul(Ws_21 [i] ,Wll_final[i]) , 

cmul(Ws_22 [i] , W21_f inal [i] ) ) ; 
W22_f inal [i]  =  cadd(cmul(Ws_21 [i] ,W12_f inal [i] ) , 

cmul (Ws_22 [i] ,W22_f inal [i] ) ) ; 


W12_final[i] 

W21_final[i] 


} 


/♦  Cleanup  all  the  storage  vectors  that  are  no  longer  needed  */ 
free (xxll);  free (xxl2);  free(xx21);  free(xx22); 
free(Rxll);  free(Rxl2);  free(Rx21);  free(Rx22); 
free(Wll);  free(W12);  free(W21);  free(W22); 

free(Wll_temp);  free(W12_temp) ;  free(W21_temp) ;  free(W22_temp) ; 
free(Yl);  free(Y2); 
free (mu); 

fprintf  (stderr,  "Leaving  Stage  7,d\n",  q) ; 

} 


fprintf (stderr,  "Finished  Stages\n"); 
fprintf (stderr,  "Seperation  Complete\n") ; 
-  - 

/*  END  of  MULTISTAGE  ITERATION  */ 

- - 


*/ 

♦/ 
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/*  Allocate  space  */ 

if  ((Yl  =  callocCBLOCK.LENGTH,  sizeof (complex)))  =*  NULL)  { 
fprintf (stderr,  "Out  of  memory\n") ; 
exit (8); 

} 

if  ((Y2  =  callocCBLOCK.LENGTH,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8) ; 

y 

if  ((yyl  =  callocCBLOCK.LENGTH,  sizeof (complex)))  ®=  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8); 

} 

if  ((yy2  =  callocCBLOCK.LENGTH,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8) ; 

} 

/*  Create  storage  space  for  final  output  signals,  yl  and  y2  */ 
if  ((yl  =  calloc((SIGNAL.LENGTH  +  BLOCK.LENGTH) ,  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n"); 
exit (8) ; 

} 

if  ((y2  =  calloc((SIGNAL.LENGTH  +  BLOCK.LENGTH),  sizeof (complex)))  ==  NULL)  { 
fprintf (stderr,  "Out  of  memory\n") ; 
exit (8); 

} 

fprintf (stderr,  "Rebuilding  Output  Vector\n"); 

/* - -  */ 

/*  REBUILD  OUTPUT  VECTORS  */ 

/*  — - - - */ 

/*  Inverse  Fourier  Transform  and  synthesize  the  separated  signals  */ 
block  =  BETA  ♦  BLOCK.LENGTH; 
for  (i  =  0;  i  <  NUMBER.BLOCKS ;  i++)  { 
for(j  =  0;  j  <  BLOCK.LENGTH;  j++)  { 

/♦Retrieve  seperated  signals  from  final  stage  ♦/ 
yyl[j]  =  xlCj]  [yy.counter] ; 

/♦Retrieve  seperated  signals  from  final  stzige  ♦/ 
yy2[j]  =  x2[j]  Cyy.counter] ; 


yy_counter++ ; 

if ft (BLOCK.LENGTH,  3ryl) ;  /♦  Inverse  Fourier  Transform  ♦/ 
ifft (BLOCK.LENGTH,  yy2);  /♦  Inverse  Fourier  Transform  ♦/ 
forCj  =  0;  j  <  BLOCK.LENGTH;  j++)  ■[ 

/♦  Pull  out  the  real  part  of  yyl,  but  keep  in  complex 
♦  form  with  0  for  imaginairy  part  ♦/ 
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yyl[j]  =  cmplx(real(yyl[j]) ,0) ; 

/*  Pull  out  the  real  part  of  yy2,  but  keep  in  complex 
*  form  with  0  for  imaginary  part  */ 

yy2[j]  =  cmplx(real(yy2[j]) ,0) ; 

} 

for(j  =  (LI  -  block  );  j  <  LI;  j++)  { 

/*  First  part  of  reconstructed  block  */ 

yl[j]  =  cadd(yl[j] ,yyl[y_counter]) ; 

/♦  First  part  of  reconstructed  block  */ 
y2 [ j ]  =  cadd (y2  C j ] , yy2 [y.count  er] ) ; 
y_counter++; 

} 

forCj  =  LI;  j  <  (LI  +  HALF_BLOCK_LENGTH) ;  j++)  { 

yl[j]  =  yyl [y.cotmter] ;  /*  Second  part  of  reconstructed  block  */ 
y2[j]  =  yy2 [y_counter] ;  /♦  Second  part  of  reconstructed  block  */ 
y_counter++ ; 

} 

LI  =  LI  +  HALF.BLOCK.LENGTH; 

y_counter  =0; 

> 


fprintf (stderr,  "Normalizing\n") ; 

- - 

/♦  NORMALIZATION 

- - -  - 

/♦  Normalization  code,  this  will  need  to  be  modified  for  real-time 

♦implementation 

/♦  Find  the  largest  sample  out  of  each  signal  */ 
for  (i  =  0;  i  <  (SIGNAL.LENGTH  +  BLOCK.LENGTH) ;  i++)  { 
if (fabs(real(yl[i]))  >  max_sample_value_l)  { 
max_sample_value_l  =  fabs(real(yl[i])) ; 

} 

if (fabs(real(y2[i]))  >  max_sample_value_2)  { 
max_sample_value_2  *  fabs(real(y2[i])) ; 

} 

}  .  , 

/♦  Do  one  divide  to  avoid  many  divides,  and  add  in  an  offset  */ 

max_sample_value_l  =  1  /  (max_sample_value_l  +  E) ; 

max_sample_value_2  =  1  /  (max_sample_value_2  +  E) ; 

/♦  Normalize  ♦/ 

for  (i  =  0;  i  <  (SIGNAL.LENGTH  +  BLOCK.LENGTH) ;  i++)  { 
yl[i].x  =  y,l[i].x  *  max_sample_value_l ; 
y2[i].x  =  y2[i] .X  ♦  max_sample_value_2 ; 

} 

- - 


♦/ 

♦/ 

♦/ 
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/*  Open  our  output  files  */ 

if  ( (output _filel  =  fopen("output_sourcel.txt",  "w"))  ==  NULL)  { 
fprintf (stderr,  "Output  file  1  cannot  be  opened"); 
exit (8) ; 

> 

if  ( (output _file2  =  fopen("output_source2.txt" ,  "w"))  ==  NULL)  { 
fprintf (stderr,  "Output  file  2  cannot  be  opened"); 
exit (8) ; 

} 

/*  Save  separated  signals  */ 
ford  =  0;  i  <  OUTPUT_SIGNAL_LENGTH;  i++)  { 
fprintf  (output  _filel ,  "‘/.0 . 16f  \n" ,  yl  [i] .  x) ; 
fprintf  (output_f  ile2 ,  "7,0 . 16f \n" ,  y2  [i]  .  x)  ; 

} 

/♦  Close  our  output  files  */ 
fclose(output_filel) ; 
fclose (output _f ile2) ; 

/*  Final  cleanup  before  we  exit  */ 
for(i  =  0;  i  <  NUMBER.BLOCKS ;  i++)  { 
free(xl[i]) ; 
free(x2[i]); 

} 

free(yl);  free(y2); 
free(3ryl);  free(yy2); 
free(Yl);  free(Y2); 

retum(O) ; 

} 


complex,  c 

/*  complex. c  -  complex  arithmetic  functions  */ 

#include  <math,h>  /*  for  MSC  and  TC/BC,  it  declares:  ♦/ 

/*  \ttt{struct  complex}  and 
*  \ttt{cabs()}  */ 

struct  complex  {double  x,  y;};  /*  uncomment  if  not  MSC  or  TC/BC  */ 

/*  uncomment  if  not  MS  or  TC/BC  */ 

1 1  double  cabs(z) 

//  complex  z; 

//  { 

//  return  sqrt(z.x  *  z.x  +  z.y  *  z.y); 
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//  } 


typedef  struct  complex  complex; 

complex  cmplxCx,  y) 
double  X,  y; 

complex  z; 

z.x  =  x;  z.y  =•  y; 

return  z; 

} 

complex  conjg(z) 
complex  z; 

return  cmplxCz.x,  -z.y); 

} 

complex  cadd(a,  b) 
complex  a,  b; 

rettim  cmplx(a.x  +  b.x,  a.y  +  b.y); 

} 

complex  csub(a,  b) 
complex  a,  b; 

return  cmplxCa.x  -  b.x,  a.y  -  b.y); 

} 

complex  cmuKa,  b) 
complex  a,  b; 

{ 

return  cmplxCa.x  *  b.x  -  a.y  *  b.y, 

} 

complex  rmuKa,  z) 
double  a; 
complex  z; 

return  cmplxCa  *  z.x,  a  *  z.y); 


/♦  z  =  cmplxCx, y)  =  x+jy  */ 


/♦  complex  conjugate  of  z=x+jy*/ 
/♦  returns  z*  =  x-jy  */ 

/*  complex  addition  */ 


/*  complex  subtraction  */ 


/*  complex  multiplication  */ 
a.x  *  b.y  +  a.y  ♦  b.x) ; 

/*  multiplication  by  real  */ 
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} 


complex  cdiv(a,  b)  /*  complex  division  */ 

complex  a,  b; 

{ 

double  D  =  b.x  *  b.x  +  b.y  *  b.y; 

return  cmplx((a.x  *  b.x  +  a.y  ♦  b.y)  /  D,  (a.y  *  b.x  -  a.x  *  b.y)  /  D) ; 

} 

complex  rdiv(z,  a)  /*  division  by  real  */ 

complex  z; 
double  a; 

{ 

return  cmplxCz.x  /  a,  z.y  /a); 

} 

double  real(z)  /*  real  part  Re(z)  */ 

complex  z; 

{ 

return  z.x; 

} 

double  aimag(z)  /*  imaginary  part  Im(z)  */ 

complex  z; 

return  z.y; 

} 

complex  cexp(z)  /*  complex  exponential  */ 

complex  z; 

■C 

double  R  =  exp (z.x); 

return  cmplxCR  ♦  cos (z.y),  R  *  sin(z.y)); 

} 

fft.c 

/*  fit . c  —  in-place  decimation-in-time  FFT  ♦/ 

#include  "cmplx.h" 

void  shuffle 0  ,  dftmergeO; 
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void  fft(N,  X) 
complex  ♦X; 
int  N; 

{ 

shuffle (N,  X); 
dftmergeCN,  X) ; 


shuffle,  c 

/*  shuffle.c  -  in-place  shuffling  (bit-reversal)  of  a  complex  array  */ 

#include  "cmplx.h" 

void  swapO ; 
int  bitrevO; 


void  shuffle (N,  X) 
complex  *X; 
int  N; 

int  n,  r,  B=l; 

while  (  (N  »  B)  >  0  ) 
B++; 

B— ; 

for  (n  =  0;  n  <  N;  n++)  { 
r  =  bitrevCn,  B); 
if  (r  <  n)  continue; 
swap (X+n ,  X+r ) ; 

} 

} 


/*  \(N\)  must  be  a  power  of  2  ♦/ 

/*  \(B\)  =  number  of  bits  */ 

/♦  \(N  =  2\sp{B}\)  */ 

/*  bit-reversed  version  of  \(n\)  */ 
/*  swap  only  half  of  the  \(n\)s  ♦/ 
/*  swap  by  addresses  */ 


dftmerge.c 

/♦  dftmerge.c  -  DFT  merging  for  radix  2  decimation-in-time  FFT  */ 

tinclude  "cmplx.h" 

void  dftmerge(N,  XF) 
complex  *XF ; 
int  N; 
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double  pi  =  4.  *  atand.O); 
int  k,  i,  p,  q,  M; 
complex  A,  B,  V,  W; 


} 


M  =  2; 


while  (M  <=  N)  { 

/* 

A 

W  =  cexp(cmplx(0.0,  -2  *  pi  /  M)); 

/* 

V  =  cmplxCl. ,  0.) ; 

/* 

for  (k  =  0;  k  <  M/2;  k++) 

/* 

for  (i  =0;  i  <  N;  i 

+=  M)  { 

/* 

ak 

p  =  k  +  i; 

/♦ 

q  =  P  +  M  /  2; 

/♦ 

A  =  XF[p]  ; 

B  =  cmuKXFCq], 

V); 

/♦ 

XFCp]  =  caddCA, 

B); 

/* 

XF[q]  =  csub(A, 

B); 

} 

V  =  cmuKV,  W): 

/* 

} 

M  =  2  *  M; 

/* 

} 


two  \((M/2)\)-DFTs  into  one 
\(M\)-DFT  ♦/ 

order-\(M\)  twiddle  factor  */ 
successive  powers  of  \(W\)  ♦/ 
index  for  an  \((M/2)\)-DFT  */ 
\(i\)th  butterfly;  increment 
by  \(M\)  */ 

absolute  indices  for  */ 
\(i\)th  butterfly  */ 

\(V  =  W\sp{k}\)  */ 
butterfly  operations  */ 


\(V  =  VW  =  W\sp{k+1}\)  ♦/ 
next  stage  ♦/ 


swap.c 

/*  swap.c  —  swap  two  complex  numbers  (by  their  addresses)  */ 

#include  "cmplx.h" 

void  swap (a, b) 
complex  *a,  *b; 

{ 

complex  t ; 

t  =  *a; 

♦a  *  *b; 

*b  =  t; 

} 


bitrev.c 

/*  bitrev.c  -  bit  reverse  of  a  B-bit  integer  n  */ 
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#define  two(x) 


(1  «  (x)) 


/♦\(2\sp{x}\)  by  left-shifting*/ 


int  bitrevCn,  B) 
int  n,  B; 

•C 

int  m,  r; 


/*  if  \(2\sp{m>\)  term  is 

♦  present,  then  ♦/ 

/*  add  \(2\sp-CB-l-m}\)  to 

♦  \(r\),  and  ♦/ 

/♦  subtract  \(2\sp{m}\) 

♦  from  \(n\)  */ 


retum(r) ; 

} 


for  (r=0,  m=B-l;  m>=0;  m — ) 
if  ((n  »  m)  “  1)  { 

r  +=  two(B-l-m); 

n  -*  two(m); 

} 


ifft.c 

/*  ifft.c  -  inverse  FFT  */ 


#include  "cmplx.h" 

void  fftC) ; 

void  ifftCN,  X) 
complex  *X; 
int  N; 

int  k; 

for  (k=0;  k<N;  k++) 

XCk]  =  conjgCXCk]); 

fft(N,  X); 

for  (k=0;  k<N;  k++) 

XCk]  *  rdivCconjgCXCk]),  (double)N); 


} 


/*  conjugate  input  */ 

/♦  compute  FFT  of  conjugate  */ 

/*  conjugate  and  divide  by 
♦  \(N\)  ♦/ 


cmplx.h 
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/*  cmplx.h  -  complex  eirithmetic  declarations  ♦/ 


#include  <math.h> 


struct  complex-Cdouble  x,  y;}; 
double  cabs (struct  complex); 

typedef  struct  complex  complex; 

complex  cmplx (double,  double); 
complex  conj g (complex) ; , 

complex  cadd( complex,  complex); 
complex  csub( complex,  complex) ; 
complex  cmul (complex,  complex); 
complex  cdiv(complex,  complex); 

complex  rmul (double,  complex); 
complex  rdiv(complex,  double); 

double  real (complex) ; 
double  aimag( complex) ; 

complex  cexp ( complex) ; 


/*  in  MSC  and  TC/BC,  it  declarares:  ♦/ 
!*  \ttt{struct  complex}  and 
*  \ttt{cabs(z)}  */ 

/*  uncomment  if  neccessary  */ 

/♦  uncomment  if  neccesairy  */ 


/*  define  complex  number  */ 
/♦  complex  conjugate  ♦/ 

/♦  complex  addition  */ 

/*  complex  subtraction  */ 

/*  complex  multiplication  */ 
/*  complex  division  ♦/ 

/♦  multiplication  by  real  */ 
/*  division  by  real  */ 

/*  real  part  */ 

/*  imaginary  part  */ 

/*  complex  exponential  */ 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


A" 

A* 

C 

F 

G 

H 

hji 

E[] 

k 

K 

L 

m 

M 

n 

N 

P 

P 

9 

Q 

Rx 

s 

SG 

T 

T 

U 

V 

w 

Xi 

X 

y 

z 


Complex-conjugate  transpose 

TVanspose 

Complex  conjugate 

Correction  matrix 

Discrete  Fourier  transform  matrix 

Projection  matrix 

Mixing  filter  matrix 

Room  impulse  response  from  source  i  to  microphone  j 

Expectation  operator 

Superblock  index 

Number  of  superblocks 

Number  of  adaptive  iterations 

Block  index 

Number  of  blocks  in  each  superblock 

Discrete  time  index 

Length  of  input  signal  in  sample 

Mixing  filter  length 

Permutation  matrix 

Stage  index 

Un-mixing  filter  length 

Correlation,  covariance  matrix  of  x 

Vector  of  sources 

Number  of  multistages 

Block  length 

Matrix  transpose  operator 
Number  of  blocks 

Cross-power  spectral  density  matrix 
Un-mbdng  filter  matrix 
component  of  vector,  x 
Vector  of  mixtures 
Vector  of  separated  signals 

Diagonal  matrix  with  Za  =  1  for  z  <  Q  and  Za  =  0  for  i  >  Q 


Off(-)  -  Off  Diagonal  matrix 
*  -  convolution  operator 

'  -  estimate 

II-IIf  “  Squared  Frobenius  norm 
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/5  -  Window  overlap  factor 

jl  -  Normalized  step  size 

p  -  Reflectivity  coefficient 

uj  -  Discrete  frequency  index 


ASR  -  Automatic  Speech  Recognition 

BSS  -  Blind  Source  (or  Speech)  Separation 

CPSD  -  CrossPower  Spectral  Density 

dB  -  Decibel 

DIT  -  Decimation. in  Time 

DPT  -  Discrete  Fourier  lifahsform 

DSP  -  Digital  Signal  Processing  (or  Processor) 

FDADF  -  FrequencyDomain  Adaptive  Decorrelation  Filtering 

FDSOS  -  FrequencyDomain,  Second  Order  Statistics 

FFT  -  Fast  Fourier  Transform 

LDC  -  Linguistics  Data  Consortium 

LSI  -  Linear,  Shift-Invariant 

MRFD  -  Multiresolution  Frequency  Domain 

PSD  -  Power  Spectral  Density 

SIR  -  Signal-to-Interference  Ratio 

SIR!  -  Signal-to-Interference  Ratio  Improvement 

SIRi  -  Input  Signal-to-Interference  Ratio 

SIRo  -  Output  Signal-to-Interference  Ratio 

SNR  -  SignaltoNoise  Ratio 

SNRI  -  SignaltoNoise  Ratio  Improvement 

STFT  -  Short-Time  Fourier  Transform 

TIMIT  -  Texas  Instruments,  Massachusetts  Institute  of  Technology 
WGN  -  White  Gaussian  Noise 
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